ysy31415 / unipaint

Code Implementation of "Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model"
Apache License 2.0
103 stars 4 forks source link

Split the train and val/test pipeline. #7

Closed QixingJiang closed 4 months ago

QixingJiang commented 4 months ago

Dear reaserchers, I would like to express my gratitude for your excellent work on this project. I find it extremely interesting and valuable. However, I am wondering if you have ever tried to separate the finetuning and inference into the traditional training and testing. For example, first finetuning the model with a batch of data, and then using the finetuned model for inference. Because I found that the current method requires a long time for finetuning before generating each image (compared to the sampling time). I have tried it myself, but the results were not satisfactory. I would greatly appreciate any experience or advice you could share with me.

I am looking forward to your reply. Thank you in advance!

Best regards.

ysy31415 commented 4 months ago

Hi, thanks for bringing up your concern! This method needs to take minutes to finetune the model for each input, which is indeed time-consuming.

According to your description, maybe you want to finetune the model for multiple subjects (eg, A, B, C ...) simultaneously, so that you can directly run the model to generate A, B or C at test time? If I understand correctly, I would say I haven't tried this solution myself but i also suppose the results will not be good, since I really do not see any works trying to fit multiple subjects through single-time finetuning with shared model weights, this process may not be easy to learn.

But there could be some alternatives for this issue: (1) Train a feed-forward model that directly takes subject image as input condition at inference, this is much faster, like Paint-by-Example, AnyDoor, IP-adapter etc. (2) Maybe you can train LoRA models, one LoRA for each subject. These Loras are plug-and-play which can be directly loaded to base model when needed. In other words, First you pretrain a set of LoRA models {A, B, C}. When you want to generate A, then load A for inference, no finetuning needed.

QixingJiang commented 4 months ago

Hi, thanks for bringing up your concern! This method needs to take minutes to finetune the model for each input, which is indeed time-consuming.

According to your description, maybe you want to finetune the model for multiple subjects (eg, A, B, C ...) simultaneously, so that you can directly run the model to generate A, B or C at test time? If I understand correctly, I would say I haven't tried this solution myself but i also suppose the results will not be good, since I really do not see any works trying to fit multiple subjects through single-time finetuning with shared model weights, this process may not be easy to learn.

But there could be some alternatives for this issue: (1) Train a feed-forward model that directly takes subject image as input condition at inference, this is much faster, like Paint-by-Example, AnyDoor, IP-adapter etc. (2) Maybe you can train LoRA models, one LoRA for each subject. These Loras are plug-and-play which can be directly loaded to base model when needed. In other words, First you pretrain a set of LoRA models {A, B, C}. When you want to generate A, then load A for inference, no finetuning needed.

Thank you for your prompt response. In fact, in my task, I need to erase the same subject in a scene, such as erasing all people in different images. This is a challenging problem, because the content to fill after erasing depends on the specific context of the image. Your method is far ahead, and the effect is better than related works, because each image allows the model to learn the context of the content during generation, thus generating a more realistic filled background. Thank you again for your contributions to the community. If you have any relevant experience or good practices on this kind of erasing and inpainting model, I hope you can share it. Thank you!

ysy31415 commented 4 months ago

Okay, I have read some works which might be related to your topic,

Inpaint Anything: Segment Anything Meets Image Inpainting [https://github.com/geekyutao/Inpaint-Anything] Inst-Inpaint: Instructing to Remove Objects with Diffusion Models [http://instinpaint.abyildirim.com/]

Maybe you have read these works before, I am not an expert in this field but i hope it will be helpful for your research

QixingJiang commented 4 months ago

Thank you for your patience in answering and helping, I wish you all the best. I will close this issue and try the solution you provided.