Lama-OT in Appendix: train from scratch or finetune from Big-Lama?

vkhoi commented 2 years ago

Dear authors. Thanks for the great work. Regarding the Lama-OT model in the Appendix-Section B, did you train it from scratch (so that it becomes aware of object mask right from the start), or is it possible to achieve the results in Fig.8 and Fig.9 (Appendix) by finetuning from Big-Lama? Thank you!

htzheng commented 2 years ago

Thank you for raising this point. The experiment in Appendix-Section B is achieved by fine-tuning the official Big-LaMa and CoModGAN checkpoint using object-aware masks for with 8 GPUs for ~3-4 days. Such a setting is fairer as training from well-trained checkpoints helps Lama-OT and CoModGAN-OT achieving their optimal performance.

vkhoi commented 2 years ago

Thank you for your answer. Can I ask some more questions about your experience finetuning Big-Lama with object-aware masks?

During the finetuning process, did you ever encounter some inpainting results where Big-Lama would just inpaint a single color into the masked region (see attached)? This does not always happen, only sometimes which I find weird.
Can you also share (just briefly is fine) your finetuning recipe for Big-Lama? For example, Big-Lama is trained on 256x256 crops of unresized image (as opposed to how CM-GAN is trained on 512x512 resized image), so did you keep the same 256x256 crop setting or did you switch to training on 512x512 resized image too? If you keep the 256x256 crop setting, then the mask generated by comodgan would be really huge and almost always occupy the whole 256x256 input image, so did you also modify the mask generation hyperparams of comodgan for Big-Lama?

Thank you in advance!

htzheng commented 2 years ago

Thank you for your answer. Can I ask some more questions about your experience finetuning Big-Lama with object-aware masks?

During the finetuning process, did you ever encounter some inpainting results where Big-Lama would just inpaint a single color into the masked region (see attached)? This does not always happen, only sometimes which I find weird.

Can you also share (just briefly is fine) your finetuning recipe for Big-Lama? For example, Big-Lama is trained on 256x256 crops of unresized image (as opposed to how CM-GAN is trained on 512x512 resized image), so did you keep the same 256x256 crop setting or did you switch to training on 512x512 resized image too? If you keep the 256x256 crop setting, then the mask generated by comodgan would be really huge and almost always occupy the whole 256x256 input image, so did you also modify the mask generation hyperparams of comodgan for Big-Lama?

Thank you in advance!

Yes, Big-Lama sometimes inpaints single color as the large perceptual weight Lama uses forces the model to generate averaged pixels. In addition, their generator and discriminator have relatively local receptive fields and there may not be a strong constraint on global structures.
I followed the same training scheme as LaMa, i.e. training resolution is 256 while testing is on higher resolution i.e. 512. Because of the mask issue you mentioned, i.e. mask is huge after cropping, I simply resize the 512x512 object-aware mask to 256x256 for training. This scheme can generalize reasonably well for 512x512 testing, but it's possible to design better masks for training Lama.

vkhoi commented 2 years ago

Thanks again. Just want to clarify one more thing since it was not explicitly stated in your answer. When generating object-aware mask to finetune LaMa, did you use masks from CoModGAN (i.e., this code) or did you keep the masks used by LaMa (i.e., this code)?

htzheng commented 2 years ago

Thanks again. Just want to clarify one more thing since it was not explicitly stated in your answer. When generating object-aware mask to finetune LaMa, did you use masks from CoModGAN (i.e., this code) or did you keep the masks used by LaMa (i.e., this code)?

I used the object-aware mask generation procedure from Appendix E.2 to generate the mask. The code is in mask_generator/mask_generator.py. Basically, it mixes comodgan masks and random object masks during training.

mingqizhang commented 2 years ago

@htzheng Hi，I finetune the co-mod-gan use object-aware masks with pretrain weights of G and D. The learning rate is set to 0.0001. I try to use ls gan loss or softplus gan loss for D, some results always gererate the single feature and during the finetuning process, D loss will be small which means it can easily discriminate real images and fake images. Can you share (just briefly is fine) your finetuning recipe for co-mod-gan, thanks. Some results as follow:

htzheng commented 2 years ago

@mingqizhang Hi, some configs that I used for fine-tuning comodgan (in the stylegan2-ada-pytorch fashion) is: --mirror=1 --gpus=8 --batch 32 --workers 4 --fp32 true --aug=noaug for training. The detailed hyper parameters: {'comodgan512': dict(ref_gpus=8, kimg=50000, mb=32, mbstd=4, fmaps=1, lrate=0.001, gamma=10, ema=10, ramp=None, map=8),}

I guess using a larger learning rate, 8 GPU training and usually 2-3 days of training would achieve similar results.

Sanster commented 1 year ago

@vkhoi Hello, I am also training LaMa-OT, but so far I haven't achieved very good results. I'm wondering how your training turned out. If you could share some experiences or results, I would greatly appreciate it. Thank you.

htzheng / CM-GAN-Inpainting

Lama-OT in Appendix: train from scratch or finetune from Big-Lama? #4