How to train Flux Lora on multiple GPUs?

ostris / ai-toolkit

Various AI scripts. Mostly Stable Diffusion stuff.

MIT License

3.23k stars 325 forks source link

How to train Flux Lora on multiple GPUs? #73

Open IAn2018cs opened 2 months ago

IAn2018cs commented 2 months ago

I need help training Flux Lora on multiple GPUs. The memory on a single GPU is not sufficient, so I want to train on multiple GPUs. However, configuring device: cuda:0,1 in the config file doesn't seem to work.

Could you please provide guidance on how to properly set up and run Flux Lora training across multiple GPUs? The current single-GPU memory limitation is preventing me from training effectively.

Any assistance or examples of multi-GPU configurations for Flux Lora would be greatly appreciated. Thank you!

asizk commented 2 months ago

I have the same problem. Have you solved it?

WarAnakin commented 2 months ago

i'm currently making changes to the scripts on my end to run mult-gpus. I have quite a bit of requests and 1 gpu doesn't cut it. I know that the kohya version of flux can run multi-gpus

cuba6112 commented 2 months ago

Also looking into this!

Eng-ZeyadTarek commented 2 months ago

Same problem here.

skein12 commented 2 months ago

yeah same issue, testing on one GPU is working great but can't see myself using this in the future without multi GPU

davidmartinrius commented 2 months ago

One way I see to train on multiple GPUs at once is to create several .yaml files, each with a different GPU and a different part of the dataset. This would require splitting the dataset into multiple parts and then, after training, combining the resulting .safetensors weights into a single file. I wouldn’t know how to do that merge.

However, the ideal solution would be to modify the code so that it uses multiple GPUs with a single .yaml file.

WarAnakin commented 2 months ago

One way I see to train on multiple GPUs at once is to create several .yaml files, each with a different GPU and a different part of the dataset. This would require splitting the dataset into multiple parts and then, after training, combining the resulting .safetensors weights into a single file. I wouldn’t know how to do that merge.

However, the ideal solution would be to modify the code so that it uses multiple GPUs with a single .yaml file.

this has already been done with some different scripts, all in all the functionality is there and accelerate can be setup for multi-gpu from the start. It's just a matter of enabling more processes, equivalent to the number of gpus and loading each one with the dataset, spread the batch size across all the gpus (this would make a batch size per device and total batch), all this needs to be done on the same machine id aka rank0

Teapack1 commented 2 months ago

Hello, is there a way to use multiple GPUs in the ai-toolkit config ? I'm trying to train with 2 x T4 GPU on Kaggle. thank you

WarAnakin commented 2 months ago

Hello, is there a way to use multiple GPUs in the ai-toolkit config ? I'm trying to train with 2 x T4 GPU on Kaggle. thank you

not yet

dydxdt commented 2 months ago

Also looking forward to the multi-gpu solutions!

zhini-web commented 2 months ago

Also looking forward to the multi-gpu solutions!

sushmitxo commented 1 month ago

Yep pls implement multi gpu use

jwadow commented 1 month ago

I confirm, two T4 x2 GPUs on Kaggle do not work. Editing the file config/examples/train_lora_flux_24gb.yaml does not help.

device: cuda:0 - Only one GPU # device: cuda:0 - Only CPU, then error device: cuda - Error device: cuda:0,1 - Error

device:
  - cuda:0
  - cuda:1

Error

WarAnakin commented 4 weeks ago

jwadow

That's not how you run multi-gpu training. Simply editing the config file won't work.