Open MazrimCoding opened 2 months ago
Can you attach your environments? Number of GPUs, configuration of accelerate, installed python libraries, training configuration, script for running one of sd-scripts, etc..
If you provide as detail as, you will be able to get more clear solution.
According to this issue, PyTorch 2.2.1 seems to work with train_network.py
.
https://github.com/pytorch/pytorch/issues/116056
If 2.2.1 doesn't work, please share more details.
has anyone successfully got this running?
No combination of accelerator settings or even changing the script to use gloo has let me successfully fully run the script.
I did get to the point of both GPUs being loaded however the script hit an error "RuntimeError: Trying to create tensor with negative dimension" which I was not able to further troubleshoot.
Wondering if it is just not compatible for now basically, or even sensible as I have heard multi GPU for image training setups have issues with the seed not being properly shared between them.