ostris / ai-toolkit

Various AI scripts. Mostly Stable Diffusion stuff.
MIT License
3.5k stars 375 forks source link

How to fine tune a pretrained LoRa #119

Open davidmartinrius opened 3 months ago

davidmartinrius commented 3 months ago

Hi everyone,

I trained a Lora, now I enhanced my dataset and I would like to fine tune my trained Lora..

  1. Do you know how to do it? Are there instructions for this?

I tried to change model name_or_path in the yaml configuration. But it needs a huggingface model with a config.json, which I don't have.

  1. On the other hand, when fine tuning a model with this project, do I need to use the first dataset I used to train the first model + the new dataset or I just need the new dataset?

Thanks!

steve84wien commented 3 months ago

Hey, If you mean to resume a training I would also like to know if this is possible and if yes what the conditions are (same rank,...)

davidmartinrius commented 3 months ago

I didn't mean to resume, but it would be interesting to know it too. I meant to use pretrained weights + a new dataset

WarAnakin commented 3 months ago

I didn't mean to resume, but it would be interesting to know it too. I meant to use pretrained weights + a new dataset

There is no using loras as pre-trained models. Best you could do is merge your lora in a model and use that for the pre-trained model path.

davidmartinrius commented 3 months ago

Ok, thanks for your response @WarAnakin . Once merged the Lora into the base model will the base model keep the trigger words? If I merge multiple LoRas will keep all the trigger words of all LoRa's?

WarAnakin commented 3 months ago

yes your trigger word will be there but the more you merge stuff, the less accurate your results will be. I would recommend training all your concepts into one lora (train with text encoder enabled to achieve this), merge your results in the base model and use that as your pre-trained base.

davidmartinrius commented 3 months ago

ok @WarAnakin , the thing is that I want to train 10.000 people into one LoRa.

  1. Is it feasible? Would it work for that amount of people? The time and the money is not a problem. I understand it can cost more than $10K and it could be training several days/weeks.

  2. How to train it with a text encoder? is this feature developed for FLUX in this project ? In the .yaml files I see "train_text_encoder: false # probably won't work with flux" But supposing that it works, just setting this to true, will it train with text encoder?

WarAnakin commented 3 months ago

@davidmartinrius That was going to be my next question: what's the architecture you want to use (sdxl, cascade, pixart, flux, etc)

Given that you want to train this in FLUX, currently the text encoder is disabled (both lora and dreambooth).

  1. In theory, yes, it should be possible it just depends on whether you want to be able to distinguish between those people and if yes you'd need to have a unique identifier/tag for each one of them.
  2. Training with the text encoder is a matter of having it enable and setting it's own learn_rate, the same way we usually do when training the unet.

To give you an idea, I talk from experience, I have trained everything you see on https://logodiffusion.com , https://imaginetees.ai as well as the realistic base model that is in part responsible for the Juggernaut XL models + many others.

Currently, when training loras for flux, we are only appending to existing concepts the base model already knows and can identify, hence why it is possible to train without captions. To give you a better idea of the current state of training multiple people, you can look here https://imagetwist.com/p/WarAnakin/761062/Flux-Trainings and you will notice how it tends to have issues being able to properly differentiate between these subjects and tends to fuse them.

davidmartinrius commented 3 months ago
  1. In theory, yes, it should be possible it just depends on whether you want to be able to distinguish between those people and if yes you'd need to have a unique identifier/tag for each one of them.

Yes, I know it often has issues when differentiating between subjects. It forces me to only show images of a specific person at a time. I understand this is a limitation now, but it would be enough for me if I could show images of specific people, one per image. Also, I know it works when adding multiple people at the prompt but it only happens after several attempts, it usually gets it wrong.

So... I understand that you assume that it is possible to train a large number of people in a single LoRA, just as it is done with one or two, etc. It is simply a matter of adding the name of each person to their corresponding caption.

What worries me is that, being such a large model, it has a tendency to not converge and that it becomes a mess when having so many people. I know that training 2 or 3 people works, although I don't know if the model will be able to distinguish between thousands, even if the captions are well defined for each person. I think no one has tried this, at least publicly.

So, by now I have no warranties... I think the only way is to try it. But I am not going to waste money if I am not sure if it will work.

davidmartinrius commented 3 months ago

Training with the text encoder is a matter of having it enable and setting it's own learn_rate, the same way we usually do when training the unet.

Do you know how to do this at the code level? What needs to be changed besides enabling it in the yaml?

WarAnakin commented 3 months ago

@davidmartinrius get in touch with me on this discord, there is something i'd like to show you. My username is the same.

murtaza-nasir commented 2 months ago

Any new update on how to resume a lora? Lets say I trained a lora for 10000 steps. Is there a way to train it again for another 5000 steps?

davidmartinrius commented 2 months ago

@murtaza-nasir Tere is no way. You need to start from scratch

murtaza-nasir commented 2 months ago

@davidmartinrius There is a way. See this: https://github.com/ostris/ai-toolkit/issues/48

davidmartinrius commented 2 months ago

Great, I'll try it...