open-mmlab / PowerPaint

[ECCV 2024] PowerPaint, a versatile image inpainting model that supports text-guided object inpainting, object removal, image outpainting and shape-guided object inpainting with only a single model. 一个高质量多功能的图像修补模型,可以同时支持插入物体、移除物体、图像扩展、形状可控的物体生成,只需要一个模型
https://powerpaint.github.io/
MIT License
364 stars 18 forks source link

Questions bout base model and prompt for PowerPaintV2 #34

Open dydxdt opened 2 months ago

dydxdt commented 2 months ago

Great work! And I have some questions about PowerPaintV2. 1) I see the base model is RealisticVision(simplified as RV) in the gradio_PowerPaint_BrushNet.py and in the BrushNet repo the author offers the RV model, too. So does PowerPaintV2 need retraining again? or just merge the BlushNet with the origin PowerPaint is enough for inference? 2) The prompts used in PP v1 and PP v2 inference scipt are different. Why they are different? For example, in V1 it has something like ', worst quality, low quality, normal quality, bad quality, blurry P_shape', while V2 doesn't. And prompt like 'empty scene' is a kind of custom design? 3) For training scripts, can you give some advice for reference? Maybe I want to train by myself. Thx very much!

sherlhw commented 2 months ago

+1

lijiaxing0213 commented 2 months ago

PowerPaintV2 combined with BrushNet needs training, as the introduction of BrushNet has changed the original distribution of latent features. Furthermore, the main idea of PowerPaint is to optimize the embedding of the task prompt. You can refer to the implementation of the EmbeddingLayerWithFixes class at https://mmagic.readthedocs.io/en/latest/autoapi/mmagic/models/editors/disco_diffusion/clip_wrapper/index.html#mmagic.models.editors.disco_diffusion.clip_wrapper.EmbeddingLayerWithFixes. Pay attention to the methods add_tokens() and add_embeddings().

sherlhw commented 2 months ago

Thanks for your reply!