tianweiy / DMD2

(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
Other
524 stars 28 forks source link

When i train sdxl model, saving fsdp model case training stop #25

Closed icelighting closed 4 months ago

icelighting commented 4 months ago

the process hangs

tianweiy commented 4 months ago

thank you for the interest. Could you provide a bit more information?

Specifically, at which line of code does the process hang?

Here https://github.com/tianweiy/DMD2/blob/111c8bd42c5e89b9e35e8d1d5c1ca9cea3829386/main/train_sd.py#L176

or

https://github.com/tianweiy/DMD2/blob/111c8bd42c5e89b9e35e8d1d5c1ca9cea3829386/main/train_sd.py#L290

Additionally, does it save the file to the disk or not (you can check the folder)?

icelighting commented 4 months ago

@tianweiy On the " torch.save(feedforward_state_dict, os.path.join(output_path, f"pytorch_model.bin"))", the second place you posed. But it's solved , just turn the torch version to the requirement file.Thanks.