Closed johnny0213 closed 2 years ago
I have used my data to fine-tune your pretrained model, which gives decent results after 60 epochs. A sample image is below. My question is that the image quality of the result images(the right part) seems lower than the original ones(the left part), while the resolutions are the same(1024*960). Lower quality will greatly affect downstream tasks such as OCR, so I wish to improve the quality of the result images. Do you have any idea how to improve the quality of the result images? Your kind advice will be much appreciated
!
I have used my data to fine-tune your pretrained model, which gives decent results after 60 epochs. A sample image is below. My question is that the image quality of the result images(the right part) seems lower than the original ones(the left part), while the resolutions are the same(1024*960). Lower quality will greatly affect downstream tasks such as OCR, so I wish to improve the quality of the result images. Do you have any idea how to improve the quality of the result images? Your kind advice will be much appreciated
!
Hi, Low-resolution image(1024*960) may lead to image quality degradation. You can try to set is_scaling=True and change the size of img. And you can Try "flatByRegressWithClassiy_triangular_v3_RGB" in utils.py to see if it works.
Great to hear from you! I want to try it out.
However I cannot find flatByRegressWithClassiy_triangular_v3_RGB in utils.py. Only _v2 is found instead.
I use your another project Distorted-Image-With-Flow-main to pre-process my images and generate the trainset. Can I simply change the save_img_shape to 5214, 4804 in perturbed_images_generation_multiProcess.py to generate images of size 2048*1920? Do I need to change anything else, such as reduce_value in the following line?
def save_img(self, m, n, fold_curve='fold', repeat_time=4, relativeShift_position='relativeShift_v2'): origin_img = cv2.imread(self.path, flags=cv2.IMREAD_COLOR)
save_img = None
save_img_shape = [5122, 4802] # <=== change here
reduce_value = np.random.choice([82, 162, 242, 322, 402, 482], p=[0.1, 0.2, 0.4, 0.1, 0.1, 0.1]) # <=== see here
base_img_shrink = save_img_shape[0] - reduce_value
In utils.py of your main project Dewarping-Document-Image-By-Displacement-Flow-Estimation-main, if is_scaling=True, then the resized image size is already 10242, 9602. So I just need to set is_scaling=True, I don't need to change anything else, is that right?
def flatByRegressWithClassiy_triangular_v2_RGB(self, perturbed_label, perturbed_label_classify, im_name, epoch, scheme='validate', is_scaling=True, perturbed_img=None):
if (scheme == 'test' or scheme == 'eval') and is_scaling: perturbed_img_path = self.perturbed_test_img_path + im_name perturbed_img = cv2.imread(perturbed_img_path, flags=cv2.IMREAD_COLOR) perturbed_img = dataloader.resize_image(perturbed_img, 10242, 9602) # <=== see here
In loss.py of your main project Dewarping-Document-Image-By-Displacement-Flow-Estimation-main, I need to change to 10242, 9602, is that right?
self.matrices_2 = torch.full((1024, 960), 2, dtype=torch.float).cuda(self.args_gpu) # <=== change here self.matrices_0 = torch.full((1024, 960), 0, dtype=torch.float).cuda(self.args_gpu) # <=== change here
Thank you for your support!
Simply changing the save_img_shape to 5214, 4804 does not work, which results in negative indices returned from function adjust_position(), for instance:
x_min= -112 x_max= 1904 y_min= 75 y_max= 1461 x_min= -104 x_max= 1896 y_min= 80 y_max= 1455 x_min= -104 x_max= 1896 y_min= 80 y_max= 1455 x_min= -96 x_max= 1888 y_min= 86 y_max= 1450 x_min= -112 x_max= 1904 y_min= 75 y_max= 1461 x_min= -120 x_max= 1912 y_min= 69 y_max= 1466
Could you shed some light on this? Thanks.
Great to hear from you! I want to try it out.
- However I cannot find flatByRegressWithClassiy_triangular_v3_RGB in utils.py. Only _v2 is found instead.
- I use your another project Distorted-Image-With-Flow-main to pre-process my images and generate the trainset. Can I simply change the save_img_shape to 521_4, 480_4 in perturbed_images_generation_multiProcess.py to generate images of size 2048*1920? Do I need to change anything else, such as reduce_value in the following line?
def save_img(self, m, n, fold_curve='fold', repeat_time=4, relativeShift_position='relativeShift_v2'): origin_img = cv2.imread(self.path, flags=cv2.IMREAD_COLOR)
save_img = None save_img_shape = [512*2, 480*2] # <=== change here reduce_value = np.random.choice([8*2, 16*2, 24*2, 32*2, 40*2, 48*2], p=[0.1, 0.2, 0.4, 0.1, 0.1, 0.1]) # <=== see here base_img_shrink = save_img_shape[0] - reduce_value
- In utils.py of your main project Dewarping-Document-Image-By-Displacement-Flow-Estimation-main, if is_scaling=True, then the resized image size is already 1024_2, 960_2. So I just need to set is_scaling=True, I don't need to change anything else, is that right?
def flatByRegressWithClassiy_triangular_v2_RGB(self, perturbed_label, perturbed_label_classify, im_name, epoch, scheme='validate', is_scaling=True, perturbed_img=None):
if (scheme == 'test' or scheme == 'eval') and is_scaling: perturbed_img_path = self.perturbed_test_img_path + im_name perturbed_img = cv2.imread(perturbed_img_path, flags=cv2.IMREAD_COLOR) perturbed_img = dataloader.resize_image(perturbed_img, 1024*2, 960*2) # <=== see here
- In loss.py of your main project Dewarping-Document-Image-By-Displacement-Flow-Estimation-main, I need to change to 1024_2, 960_2, is that right?
self.matrices_2 = torch.full((1024, 960), 2, dtype=torch.float).cuda(self.args_gpu) # <=== change here self.matrices_0 = torch.full((1024, 960), 0, dtype=torch.float).cuda(self.args_gpu) # <=== change here
Thank you for your support!
Hi,
gwxie, thank you! I'm working on it, and will let you know if any progresses. Cheers.
Hi gwxie,
With the save_img_shape and enlarge_img_shrink changed, I have successfully generated high-resolution(2048*1920) images for training. However the image generation is very time-consuming(3-5 minutes each, single process), and storage-consuming(48M for one single generated data files(.gw)). That is to say, 480G storage is needed for 10,000 training images. So I just generated a dozen of images for a dry run.
With your updated codes, I managed to start training with batch_size=1. However OOM was reported on my rtx3060 with 12G memory:
Traceback (most recent call last):
File "train.py", line 327, in
Previously I used batch_size=2 for low-resolution(1024960) images on rtx3060 with 12G mem. So maybe 20M+ memory is required for high-resolution images(20481920), batch_size=1.
Pls let me know if I got something wrong. The training aborted, so I had no chance to try if the _v3 codes work or not.
Thank you very much anyway! I may try it out later if I get hold of more resources.
First I want to thank you for your great work!