lksshw / SRNet

A pytorch implementation of the SRNet architecture from the paper Editing text in the wild (Liang Wu et al.)
152 stars 35 forks source link

About the training results #7

Open wayne2tech opened 4 years ago

wayne2tech commented 4 years ago

Hi,Niwhskal. Following your suggestion, I used your pre-trained model and trained for about 150k iterations, but the final results are not very good, especially in the background inpainting module, its loss remains at 4.612..., the following is some training results, can you give some effective suggestions, thank you very much. result

lksshw commented 4 years ago

Could you show me your t_sk generations? and also your input mask (mask_t) for the same example?

wayne2tech commented 4 years ago

Could you show me your t_sk generations? and also your input mask (mask_t) for the same example?

This is generated by tsk data, but the mask_t corresponding to i_s cannot be found. This is a mask of another picture. By the way, the size of the model I saved is 200m, and the size you provided is 80m. You know it is What's the reason?I am very worried that Db loss has not converged. Very much hope for your help! Thanks!!!!

啊啊啊

lksshw commented 4 years ago

While generating your data, make sure that the letters are spaced well. If the letters are cramped then tsk generations will be bad and the rendering will get worse too.

You can :

  1. Increase the space between letters by changing this line. Test out a few renders before settling on one.

  2. Train with the mask. Use function build_l1_loss_with_mask in loss.py

Dont worry about the size of the model. It depends upon your training and the values at the time of saving.

wayne2tech commented 4 years ago

While generating your data, make sure that the letters are spaced well. If the letters are cramped then tsk generations will be bad and the rendering will get worse too.

You can :

  1. Increase the space between letters by changing this line. Test out a few renders before settling on one.
  2. Train with the mask. Use function _build_l1_loss_withmask in loss.py

Dont worry about the size of the model. It depends upon your training and the values at the time of saving.

ok,thank you.I will try it again. But I wanna know why my db_loss is so high and the result is so bad.Is your Ob effect also so bad? by the way, your train_data number is 100000? My training data is only 6000...

lksshw commented 4 years ago

Try to train with at least 40k images for good results. The more the better. I trained with 80k images.

wayne2tech commented 4 years ago

Try to train with at least 40k images for good results. The more the better. I trained with 80k images.

many thanks, I will try~

zgw996 commented 4 years ago

Hi,Niwhskal.Thanks for your amazing work.

  1. I directly used the model and labels you provided and got satisfactory results, but when I used your model to try another language replacement on the GPU, first of all, the font learned very strange colors, and the BIM effect was also very poor.

  2. In addition, I use your model to train on the CPU (this time it is the replacement from English to English), the background inpainting module works well.

  3. Ignoring the differences in language, do you have any good suggestions for my output, mainly the BIM effect and the color of the font learned.

here is some results

Git
lksshw commented 4 years ago

Try training with the mask (link).

zgw996 commented 3 years ago

Hi,Niwhskal. I got very good results through your code. In addition, I changed ResNet to SE-ResNet in TCM, but I still have a question. Compared with the original picture, the replaced background has a blurred effect. I put the SRNet model's resulting image back to the original image, the visual effect is very poor. How did you deal with this problem? In other words, how did you achieve the replacement effect of your gif image? Best wishes!!!!

lksshw commented 3 years ago

Hi,Niwhskal. I got very good results through your code. In addition, I changed ResNet to SE-ResNet in TCM, but I still have a question. Compared with the original picture, the replaced background has a blurred effect. I put the SRNet model's resulting image back to the original image, the visual effect is very poor. How did you deal with this problem? In other words, how did you achieve the replacement effect of your gif image? Best wishes!!!!

The SRNET does introduce some blur. This is because while training, it scales an entire batch to the average height and width of the batch. So images which have lower dimensions tend to get stretched and hence the blur. This however is not too bad. You can reduce the blur by:

  1. maintaining the height and width of the cropped image. i.e., make sure you don't change the dimensions of the cropped image.

  2. Use High resolution images to test.

The way I did this in my demo (in the gif) was to use the CRAFT character recognition model to get good bounding boxes. Crop these b.boxes and pass it through SRNET. Paste the output of SRNET back in the original image. The demo just does all of this in one pass.

If you're planning on using it for a demo or as a product, I'd suggest you use a bounding box detector such as CRAFT.

Asia-DMC commented 3 years ago

嗨,尼维斯卡尔。我通过你的代码得到了很好的结果。另外我在TCM里把ResNet改成了SE-ResNet,还是有疑问。与原图相比,替换后的背景具有模糊效果。我把SRNet模型的结果图放回原图,视觉效果很差。你是如何处理这个问题的?换句话说,你是如何实现你的gif图片的替换效果的?最好的祝愿!!!!

你有解决模糊问题嘛

harshalchaudhari35 commented 1 year ago

hi @Niwhskal

I'm new to GAN and images datasets so I'm trying to get the model working for higher resolution cropped images and in an attempt of doing so I changed the following parameters, hoping to get a larger scaling ratio and in turn get to work with large res cropped images...perhaps there is another way?

batch_size = 4
data_shape = [128, None]

I'm facing similar issues as stated above with my db_loss being stuck near ~4.6

  1. maintaining the height and width of the cropped image. i.e., make sure you don't change the dimensions of the cropped image.
  1. Could you elaborate a bit on what you mean here? why should we not change the dimensions of the cropped image?

Also, you mention about using function build_l1_loss_with_mask in loss.py

  1. So does this function replace the build_generator_loss or build_discriminator_loss or both in train.py?

And lastly,

Try to train with at least 40k images for good results. The more the better. I trained with 80k images.

  1. what was the number of iterations you used for 80k images?

Sorry if my questions are naive. TIA