Open IAn2018cs opened 2 months ago
I have the same problem. Have you solved it?
i'm currently making changes to the scripts on my end to run mult-gpus. I have quite a bit of requests and 1 gpu doesn't cut it. I know that the kohya version of flux can run multi-gpus
Also looking into this!
Same problem here.
yeah same issue, testing on one GPU is working great but can't see myself using this in the future without multi GPU
One way I see to train on multiple GPUs at once is to create several .yaml files, each with a different GPU and a different part of the dataset. This would require splitting the dataset into multiple parts and then, after training, combining the resulting .safetensors weights into a single file. I wouldn’t know how to do that merge.
However, the ideal solution would be to modify the code so that it uses multiple GPUs with a single .yaml file.
One way I see to train on multiple GPUs at once is to create several .yaml files, each with a different GPU and a different part of the dataset. This would require splitting the dataset into multiple parts and then, after training, combining the resulting .safetensors weights into a single file. I wouldn’t know how to do that merge.
However, the ideal solution would be to modify the code so that it uses multiple GPUs with a single .yaml file.
this has already been done with some different scripts, all in all the functionality is there and accelerate can be setup for multi-gpu from the start. It's just a matter of enabling more processes, equivalent to the number of gpus and loading each one with the dataset, spread the batch size across all the gpus (this would make a batch size per device and total batch), all this needs to be done on the same machine id aka rank0
Hello, is there a way to use multiple GPUs in the ai-toolkit config ? I'm trying to train with 2 x T4 GPU on Kaggle. thank you
Hello, is there a way to use multiple GPUs in the ai-toolkit config ? I'm trying to train with 2 x T4 GPU on Kaggle. thank you
not yet
Also looking forward to the multi-gpu solutions!
Also looking forward to the multi-gpu solutions!
Yep pls implement multi gpu use
I confirm, two T4 x2 GPUs on Kaggle do not work. Editing the file config/examples/train_lora_flux_24gb.yaml does not help.
device: cuda:0
- Only one GPU
# device: cuda:0
- Only CPU, then error
device: cuda
- Error
device: cuda:0,1
- Error
device:
- cuda:0
- cuda:1
Error
jwadow
That's not how you run multi-gpu training. Simply editing the config file won't work.
I need help training Flux Lora on multiple GPUs. The memory on a single GPU is not sufficient, so I want to train on multiple GPUs. However, configuring device: cuda:0,1 in the config file doesn't seem to work.
Could you please provide guidance on how to properly set up and run Flux Lora training across multiple GPUs? The current single-GPU memory limitation is preventing me from training effectively.
Any assistance or examples of multi-GPU configurations for Flux Lora would be greatly appreciated. Thank you!