Open kopyl opened 2 months ago
If you remove these args:
--validation_prompt
--validation_epochs
Then the training does not stop.
Do you mean to say that performing validation in dreambooth-flux training does not work with the current scripts?
@a-r-r-o-w yep. The problem is that the model can't be loaded for the inference.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I wonder if there is any update on this issue? Also faced with this issue using the latest diffusers 0.32. Seems that after using FSDP to wrap the model, the shape of some of the parameter has been changed. Also, I wonder if this is related to this issue: https://github.com/huggingface/transformers/issues/30228 @kopyl Any workaround on this issue?
cc @linoytsaban @sayakpaul
I think this should be solved now. Can you try with the recent versions and ensure diffusers
is installed from the main
?
Also, maybe try after changing your accelerate
config use a single GPU? num_processes: 1
?
@weixiong-ur did not find a solution and decided to switch to kohya-ss sd-scripts for the training which works almost wonders.
@sayakpaul the thing is that i had to use x2 GPU devices to fit everything into the memory...
Describe the bug
I run the training but get this error
Reproduction
Run
accelerate config
Logs
System Info
Ubuntu 20.04 x2 NVIDIA H100 CUDA 12.2 torch==2.4.1 torchvision==0.19.1 Diffusers commit: https://github.com/huggingface/diffusers/commit/ba5af5aebbac0cc18168076a18836f175753d1c7
Who can help?
No response