gwxie / Dewarping-Document-Image-By-Displacement-Flow-Estimation

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network
MIT License
155 stars 35 forks source link

Quality of de-warped images is low #8

Closed johnny0213 closed 2 years ago

johnny0213 commented 2 years ago

First I want to thank you for your great work!

johnny0213 commented 2 years ago

I have used my data to fine-tune your pretrained model, which gives decent results after 60 epochs. A sample image is below. My question is that the image quality of the result images(the right part) seems lower than the original ones(the left part), while the resolutions are the same(1024*960). Lower quality will greatly affect downstream tasks such as OCR, so I wish to improve the quality of the result images. Do you have any idea how to improve the quality of the result images? Your kind advice will be much appreciated 20_1 copy !

gwxie commented 2 years ago

I have used my data to fine-tune your pretrained model, which gives decent results after 60 epochs. A sample image is below. My question is that the image quality of the result images(the right part) seems lower than the original ones(the left part), while the resolutions are the same(1024*960). Lower quality will greatly affect downstream tasks such as OCR, so I wish to improve the quality of the result images. Do you have any idea how to improve the quality of the result images? Your kind advice will be much appreciated 20_1 copy !

Hi, Low-resolution image(1024*960) may lead to image quality degradation. You can try to set is_scaling=True and change the size of img. And you can Try "flatByRegressWithClassiy_triangular_v3_RGB" in utils.py to see if it works.

johnny0213 commented 2 years ago

Great to hear from you! I want to try it out.

  1. However I cannot find flatByRegressWithClassiy_triangular_v3_RGB in utils.py. Only _v2 is found instead.

  2. I use your another project Distorted-Image-With-Flow-main to pre-process my images and generate the trainset. Can I simply change the save_img_shape to 5214, 4804 in perturbed_images_generation_multiProcess.py to generate images of size 2048*1920? Do I need to change anything else, such as reduce_value in the following line?

    def save_img(self, m, n, fold_curve='fold', repeat_time=4, relativeShift_position='relativeShift_v2'): origin_img = cv2.imread(self.path, flags=cv2.IMREAD_COLOR)

    save_img = None save_img_shape = [5122, 4802] # <=== change here
    reduce_value = np.random.choice([82, 162, 242, 322, 402, 482], p=[0.1, 0.2, 0.4, 0.1, 0.1, 0.1]) # <=== see here base_img_shrink = save_img_shape[0] - reduce_value

  3. In utils.py of your main project Dewarping-Document-Image-By-Displacement-Flow-Estimation-main, if is_scaling=True, then the resized image size is already 10242, 9602. So I just need to set is_scaling=True, I don't need to change anything else, is that right?

    def flatByRegressWithClassiy_triangular_v2_RGB(self, perturbed_label, perturbed_label_classify, im_name, epoch, scheme='validate', is_scaling=True, perturbed_img=None):

    if (scheme == 'test' or scheme == 'eval') and is_scaling: perturbed_img_path = self.perturbed_test_img_path + im_name perturbed_img = cv2.imread(perturbed_img_path, flags=cv2.IMREAD_COLOR) perturbed_img = dataloader.resize_image(perturbed_img, 10242, 9602) # <=== see here

  4. In loss.py of your main project Dewarping-Document-Image-By-Displacement-Flow-Estimation-main, I need to change to 10242, 9602, is that right?

    self.matrices_2 = torch.full((1024, 960), 2, dtype=torch.float).cuda(self.args_gpu) # <=== change here self.matrices_0 = torch.full((1024, 960), 0, dtype=torch.float).cuda(self.args_gpu) # <=== change here

Thank you for your support!

johnny0213 commented 2 years ago

Simply changing the save_img_shape to 5214, 4804 does not work, which results in negative indices returned from function adjust_position(), for instance:

x_min= -112 x_max= 1904 y_min= 75 y_max= 1461 x_min= -104 x_max= 1896 y_min= 80 y_max= 1455 x_min= -104 x_max= 1896 y_min= 80 y_max= 1455 x_min= -96 x_max= 1888 y_min= 86 y_max= 1450 x_min= -112 x_max= 1904 y_min= 75 y_max= 1461 x_min= -120 x_max= 1912 y_min= 69 y_max= 1466

Could you shed some light on this? Thanks.

gwxie commented 2 years ago

Great to hear from you! I want to try it out.

  1. However I cannot find flatByRegressWithClassiy_triangular_v3_RGB in utils.py. Only _v2 is found instead.
  2. I use your another project Distorted-Image-With-Flow-main to pre-process my images and generate the trainset. Can I simply change the save_img_shape to 521_4, 480_4 in perturbed_images_generation_multiProcess.py to generate images of size 2048*1920? Do I need to change anything else, such as reduce_value in the following line?

def save_img(self, m, n, fold_curve='fold', repeat_time=4, relativeShift_position='relativeShift_v2'): origin_img = cv2.imread(self.path, flags=cv2.IMREAD_COLOR)

save_img = None
save_img_shape = [512*2, 480*2]   # <=== change here  
reduce_value = np.random.choice([8*2, 16*2, 24*2, 32*2, 40*2, 48*2], p=[0.1, 0.2, 0.4, 0.1, 0.1, 0.1])   # <=== see here
base_img_shrink = save_img_shape[0] - reduce_value
  1. In utils.py of your main project Dewarping-Document-Image-By-Displacement-Flow-Estimation-main, if is_scaling=True, then the resized image size is already 1024_2, 960_2. So I just need to set is_scaling=True, I don't need to change anything else, is that right?

def flatByRegressWithClassiy_triangular_v2_RGB(self, perturbed_label, perturbed_label_classify, im_name, epoch, scheme='validate', is_scaling=True, perturbed_img=None):

  if (scheme == 'test' or scheme == 'eval') and is_scaling:
      perturbed_img_path = self.perturbed_test_img_path + im_name
      perturbed_img = cv2.imread(perturbed_img_path, flags=cv2.IMREAD_COLOR)
      perturbed_img = dataloader.resize_image(perturbed_img, 1024*2, 960*2)    # <=== see here
  1. In loss.py of your main project Dewarping-Document-Image-By-Displacement-Flow-Estimation-main, I need to change to 1024_2, 960_2, is that right?

self.matrices_2 = torch.full((1024, 960), 2, dtype=torch.float).cuda(self.args_gpu) # <=== change here self.matrices_0 = torch.full((1024, 960), 0, dtype=torch.float).cuda(self.args_gpu) # <=== change here

Thank you for your support!

Hi,

  1. Please update the code, which was uploaded recently.
  2. like this: save_img_shape = [5122, 4802] #==>[5124, 4804] enlarge_img_shrink = [8962, 7682] #==>[8964, 7684]
    1. Yes, you are right
    2. self.matrices_2/0 is dispensable, they are useless.
johnny0213 commented 2 years ago

gwxie, thank you! I'm working on it, and will let you know if any progresses. Cheers.

johnny0213 commented 2 years ago

Hi gwxie,

With the save_img_shape and enlarge_img_shrink changed, I have successfully generated high-resolution(2048*1920) images for training. However the image generation is very time-consuming(3-5 minutes each, single process), and storage-consuming(48M for one single generated data files(.gw)). That is to say, 480G storage is needed for 10,000 training images. So I just generated a dozen of images for a dry run.

With your updated codes, I managed to start training with batch_size=1. However OOM was reported on my rtx3060 with 12G memory:

Traceback (most recent call last): File "train.py", line 327, in train(args) File "train.py", line 152, in train loss.backward() File "D:\Anaconda3\envs\lpgma\lib\site-packages\torch\tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "D:\Anaconda3\envs\lpgma\lib\site-packages\torch\autograd__init__.py", line 147, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: CUDA out of memory. Tried to allocate 480.00 MiB (GPU 0; 12.00 GiB total capacity; 9.83 GiB already allocated; 0 bytes free; 10.14 GiB reserved in total by PyTorch)

Previously I used batch_size=2 for low-resolution(1024960) images on rtx3060 with 12G mem. So maybe 20M+ memory is required for high-resolution images(20481920), batch_size=1.

Pls let me know if I got something wrong. The training aborted, so I had no chance to try if the _v3 codes work or not.

Thank you very much anyway! I may try it out later if I get hold of more resources.