How should I go about training the ip adapter plus sdxl for inpainting by providing it a reference image ?

From what I gathered from multiple issues ( #263 #261 #162 ), The training json should look like this: { "prompt": "description of reference", "reference": "reference.jpg", "output": "groundtruth.jpg" }

but I want to provide the training data in this format:

{ "mask": "blackandwhitemask.jpg", "reference": "reference.jpg", "input": "originalimage.jpg" "output": "inpaintedoutputusingthemask.jpg" }

{ "prompt": "description of the reference image", "mask": "blackandwhitemask.jpg", "reference": "reference.jpg", "input": "originalimage.jpg" "output": "inpaintedoutputusingthemask.jpg" }

is it possible to create a dataset like this ? is there a better way ? I think I should focus on tutorial_train_plus.py, would I need to change anything so that I can provide the mask as well for inpainting ? How big the training dataset should be If I want to fine tune it ? Just to clear, I am not training it on faces.

tencent-ailab / IP-Adapter

How should I go about training the ip adapter plus sdxl for inpainting by providing it a reference image ? #391