hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
21.77k stars 2.11k forks source link

Out-of-memory for default config. #409

Closed chehx closed 3 months ago

chehx commented 4 months ago

Many thanks for open-sourcing this great project.

Currently, I meet the out-of-memory error when training.

I use the default training config in stage3.py and I have 2 A100 80G.

However, it raises the error, but in report 1.1, it says the default config is for 80G memory usage.

Currently, when I use 480p with 48 frames, it takes around 73GB.

JamesTensor commented 4 months ago

I read report 1.1 and it does not state that only 80G of memory is required for training. Where did you see that?

chehx commented 4 months ago

I read report 1.1 and it does not state that only 80G of memory is required for training. Where did you see that?

https://github.com/hpcaitech/Open-Sora/issues/344#issuecomment-2102359347

Honestly, I saw this response. Did I misunderstand something?

github-actions[bot] commented 4 months ago

This issue is stale because it has been open for 7 days with no activity.

chehx commented 4 months ago

I found the problem!

When I wanna use the pre-trained weight via huggingface, it will load the config file:

https://huggingface.co/hpcai-tech/OpenSora-STDiT-v2-stage3/blob/main/config.json

where, the

"enable_flash_attn": false, "enable_layernorm_kernel": false,

is forbidden!

github-actions[bot] commented 4 months ago

This issue is stale because it has been open for 7 days with no activity.

JThh commented 3 months ago

I am gonna close this issue since it appears to have been resolved by the question owner.