tianweiy / DMD2

(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
Other
526 stars 28 forks source link

is there plan for releasing code for grad accumulation? #23

Open grovessss opened 5 months ago

tianweiy commented 5 months ago

we don't have the code for grad accumulation. You can potentially modify https://github.com/tianweiy/DMD2/blob/main/main/train_sd.py to achieve this though

grovessss commented 5 months ago

sure thank you! can I ask why simply setting gradient_accumulation_steps=args.gradient_accumulation_steps in accelerator definition doesn't work?

tianweiy commented 5 months ago

you will need something like

https://github.com/huggingface/diffusers/blob/35f45ecd71a5c917406408a02bc982c3795d5a35/examples/text_to_image/train_text_to_image.py#L939

https://github.com/huggingface/diffusers/blob/35f45ecd71a5c917406408a02bc982c3795d5a35/examples/text_to_image/train_text_to_image.py#L1027

grovessss commented 5 months ago

thank you very much!

maybe it's not so relevant with your method, but i was wondering with gradient accumulation, is there anything special that needed to be done regarding the optimization at different frequencies process in your method? or can accelerate do it all