tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
4.46k stars 289 forks source link

How should I go about training the ip adapter plus sdxl for inpainting by providing it a reference image ? #391

Open HG2407 opened 2 days ago

HG2407 commented 2 days ago

From what I gathered from multiple issues ( #263 #261 #162 ), The training json should look like this: { "prompt": "description of reference", "reference": "reference.jpg", "output": "groundtruth.jpg" }

but I want to provide the training data in this format:

{ "mask": "blackandwhitemask.jpg", "reference": "reference.jpg", "input": "originalimage.jpg" "output": "inpaintedoutputusingthemask.jpg" }

or

{ "prompt": "description of the reference image", "mask": "blackandwhitemask.jpg", "reference": "reference.jpg", "input": "originalimage.jpg" "output": "inpaintedoutputusingthemask.jpg" }

is it possible to create a dataset like this ? is there a better way ? I think I should focus on tutorial_train_plus.py, would I need to change anything so that I can provide the mask as well for inpainting ? How big the training dataset should be If I want to fine tune it ? Just to clear, I am not training it on faces.