ostris / ai-toolkit

Various AI scripts. Mostly Stable Diffusion stuff.
MIT License
2.71k stars 259 forks source link

The issue of facial contamination in LoRA #166

Open chaorenai opened 6 days ago

chaorenai commented 6 days ago

In the character Lora, if the output is a group photo, the face of the character Lora contaminates the faces of other people in the group. Various methods such as adjusting the dataset, lowering the learning rate, and layer-wise training have been tried, but the issue cannot be resolved. What exactly is going wrong?

fofr commented 6 days ago

Have you tried using regularisation images?

kuzman123 commented 6 days ago

You need to provide more info: trained with or without captions? Network DIm\alpha? Because if you've trained Dim 128 for example, it's most likely that your Lora weights are huge, and weaker tokens can't break through it (faces of random AI generated humas). But anyways, in order to generate images with more different subjects, you just NEED to use attention masking and Inpainting (i'm using ComfyUI for that and it is amazing what you can achieve with masks + inpaint). What i like to do is finding images with desired composition (1 main subject with minions besides, in your example), making mask image for that subject, and using it as Attention mask in IPAdapter, or just Faceswap. I don't think there is any other way to generate the image you want, because you will always have to make a compromise between the strength of your Lora (by reducing it, you will reduce the impact on other characters, but the likeness will also decrease).

chaorenai commented 6 days ago

You need to provide more info: trained with or without captions? Network DIm\alpha? Because if you've trained Dim 128 for example, it's most likely that your Lora weights are huge, and weaker tokens can't break through it (faces of random AI generated humas). But anyways, in order to generate images with more different subjects, you just NEED to use attention masking (i'm using ComfyUI for that and it is amazing what you can achieve with masks). What i like to do is finding images with desired composition (1 main subject with minions besides, in your example), making mask image for that subject, and using it as Attention mask in IPAdapter. I don't think there is any other way to generate the image you want, because you will always have to make a compromise between the strength of your Lora (by reducing it, you will reduce the impact on other characters, but the likeness will also decrease).

Thank you for your reply. I was about to give up, but you gave me hope. Here's my training process:

All of the data was labeled, using natural language labeling generated by ChatGPT-4.0 or LLaMA 3.1. For the learning rate (lr), I tested lr: 0.00015, lr: 0.00025, lr: 0.0003, and lr: 0.0004. Regarding Dim/alpha, if I'm not using layered training, I mainly use 16/1. If I am using layered training, it's 128/128. For steps, I’ve tested 1000, 3000, 6000, and 10,000, but none of them solved the facial distortion issue. Could you please teach me how to use attention masking specifically?

Do you have X (formerly Twitter) or YouTube? I would like to follow you.

kuzman123 commented 5 days ago

I never EVER use captions for training faces, just trigger words (ohwx man, or ohwx woman.. girl etc). Default LR 1e-4 (0.0004) is good. Set Dim/Alpha - 32/32. Optimizer - i prefer Adafactor, but you can use AdamW8bit, Prodigy... 150 Dataset repeats, save every 10-15 epochs. All of this falls apart if the dataset is not good, of course. I'm not on X, you can Instagram me @artproai

chaorenai commented 5 days ago

I never EVER use captions for training faces, just trigger words (ohwx man, or ohwx woman.. girl etc). Default LR 1e-4 (0.0004) is good. Set Dim/Alpha - 32/32. Optimizer - i prefer Adafactor, but you can use AdamW8bit, Prodigy... 150 Dataset repeats, save every 10-15 epochs. All of this falls apart if the dataset is not good, of course. I'm not on X, you can Instagram me @artproai

I know that using masks, InstantID, and inpainting during the generation process can control the output. However, what I hope to achieve is solving the face contamination issue during the Lora training itself. I've tried various methods like regularization and layer-wise training, but they all failed... I'll register for an Instagram account and make a friend there to thank you. Thanks again!

chaorenai commented 5 days ago

Have you tried using regularisation images?

Have you tried using regularisation images?

I’ve been following you on X and also left a comment on your X post regarding this issue. The regularization has been tested, but it still doesn't solve the problem. When you trained the character LoRA, did you experience any face contamination issues when generating a double or multi-person photo? How did you resolve this?

GXcells commented 4 days ago

Nothing works. Many many many people tried and discussed it in the discord and that is basically impossible. Try training a Lokr with simple-tuner. People managed to train several people in same Lokr. So basically it is possible to not have bleeding with Lokr. Haven't seen it with my own eyes though.

chaorenai commented 4 days ago

Nothing works. Many many many people tried and discussed it in the discord and that is basically impossible. Try training a Lokr with simple-tuner. People managed to train several people in same Lokr. So basically it is possible to not have bleeding with Lokr. Haven't seen it with my own eyes though.

If the same Lora is trained on several people, then when generating group photos with this Lora, it will also be limited to these specific people, right?

GXcells commented 4 days ago

I am not sure, you may need to prompt some facial characteristics of other people to avoid this. I unfortunately can't install tuner at the moment so I can't try

On Sat, 14 Sep 2024, 08:50 SUNNYS @.***> wrote:

Nothing works. Many many many people tried and discussed it in the discord and that is basically impossible. Try training a Lokr with simple-tuner. People managed to train several people in same Lokr. So basically it is possible to not have bleeding with Lokr. Haven't seen it with my own eyes though.

If the same Lora is trained on several people, then when generating group photos with this Lora, it will also be limited to these specific people, right?

— Reply to this email directly, view it on GitHub https://github.com/ostris/ai-toolkit/issues/166#issuecomment-2350880629, or unsubscribe https://github.com/notifications/unsubscribe-auth/BEN3AY5ICLUL62PW4E7GDETZWPMD3AVCNFSM6AAAAABOAUGIWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJQHA4DANRSHE . You are receiving this because you commented.Message ID: @.***>

chaorenai commented 3 days ago

https://huggingface.co/TheLastBen/The_Hound This Lora solves the problem of facial pollution, but I wonder if it has anything to do with him being a celebrity?

chaorenai commented 3 days ago

https://huggingface.co/TheLastBen/The_Hound This Lora solves the problem of facial pollution, but I wonder if it has anything to do with him being a celebrity?

This one only trained 2 layers, and the dim value is relatively low. After careful testing, there is still facial pollution, but it is relatively small.