tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
4.46k stars 289 forks source link

Training without text #382

Open athena913 opened 3 weeks ago

athena913 commented 3 weeks ago

Hi,

1) Is it possible to train the IPAdapter model without text? The training tutorial code expects the data input to have a list of {image_file, text}. But given that the IPAdapter is a visual adapter, why does it need a dataset with text for training, since the base SD model is already trained on <image, text> input.

2) I am currently using your pretrained IPAdapter (not the face model, just the base IP adapter) in inference mode for pose transfer. I am providing the upper body image of the person as input and the resulting face of the person looks somewhat like the input person, but it is not a exact replica. Will training the IPAdapter on a human face dataset like FFHQ or CelebFaces (without any text annotation) help in a better reconstruction of the source person?

Thank you.

xiaohu2015 commented 1 week ago

yes, it can train without text.

for face, you can use some face model to improve