hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
20.34k stars 1.92k forks source link

[仅做记录] 解决方案: ImportError: /data/venvs/sora/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi #472

Closed TalentBoy2333 closed 2 weeks ago

TalentBoy2333 commented 2 weeks ago

运行

python3 scripts/inference.py configs/opensora-v1-2/inference/sample.py \
  --num-frames 4s --resolution 720p --aspect-ratio 9:16 \
  --prompt "a beautiful waterfall"

报错

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:08<00:00,  4.37s/it]
  0%|                                                                                                                                                                                                                   | 0/1 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/data/projects/aigc/Open-Sora/scripts/inference.py", line 303, in <module>
    main()
  File "/data/projects/aigc/Open-Sora/scripts/inference.py", line 265, in main
    samples = scheduler.sample(
  File "/data/venvs/sora/lib/python3.10/site-packages/opensora/schedulers/rf/__init__.py", line 88, in sample
    pred = model(z_in, t, **model_args).chunk(2, dim=1)[0]
  File "/data/venvs/sora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data/venvs/sora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/venvs/sora/lib/python3.10/site-packages/opensora/models/stdit/stdit3.py", line 404, in forward
    x = auto_grad_checkpoint(spatial_block, x, y, t_mlp, y_lens, x_mask, t0_mlp, T, S)
  File "/data/venvs/sora/lib/python3.10/site-packages/opensora/acceleration/checkpoint.py", line 24, in auto_grad_checkpoint
    return module(*args, **kwargs)
  File "/data/venvs/sora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data/venvs/sora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/venvs/sora/lib/python3.10/site-packages/opensora/models/stdit/stdit3.py", line 123, in forward
    x_m = self.attn(x_m)
  File "/data/venvs/sora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data/venvs/sora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/venvs/sora/lib/python3.10/site-packages/opensora/models/layers/blocks.py", line 189, in forward
    from flash_attn import flash_attn_func
  File "/data/venvs/sora/lib/python3.10/site-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/data/venvs/sora/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
    import flash_attn_2_cuda as flash_attn_cuda
ImportError: /data/venvs/sora/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

参考 https://github.com/Dao-AILab/flash-attention/issues/966 解决

python 3.10.14, Cuda 12.1, Ubuntu22.04.4 LTS
torch==2.3.0, flash-attn==2.5.8 works (2.5.9post1 has the same failure)
zhengzangw commented 2 weeks ago

Thanks for sharing!

Another workaround is:

python scripts/inference.py configs/opensora-v1-2/inference/sample.py \
  --num-frames 4s --resolution 720p \
  --layernorm-kernel False --flash-attn False \
  --prompt "a beautiful waterfall"
TalentBoy2333 commented 2 weeks ago

试过关掉 flash-attn, 在 2 张 A100 上 cuda oom 了

zhengzangw commented 2 weeks ago

Yes, without flash-attn 720p it costs too much memory.

TalentBoy2333 commented 2 weeks ago

Maybe I should make more money and buy some GPUs that don't exist in China. :doge