hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
20.34k stars 1.92k forks source link

使用autodl的云服务器,配置完文件之后运行demo,结果报错 #432

Closed Fang-zhixian closed 2 weeks ago

Fang-zhixian commented 4 weeks ago

使用autodl的云服务器,配置完文件之后运行demo,结果报错。 云服务器配置如下: 镜像 PyTorch 2.3.0 Python 3.12(ubuntu22.04) Cuda 12.1

GPU A800-80GB(80GB) * 1 CPU 14 vCPU Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz 内存 100GB 硬盘 系统盘:30 GB 数据盘:免费:50GB 付费:0GB

报错结果如下: base) root@autodl-container-d293479255-b810ea53:~/Open-Sora# torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path OpenSora-v1-HQ-16x512x512.pth --prompt-path ./assets/texts/t2v_samples.txt Fatal Python error: Segmentation fault

Current thread 0x00007fb5899cf740 (most recent call first): File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/elastic/rendezvous/c10d_rendezvous_backend.py", line 113 in _call_store File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/elastic/rendezvous/c10d_rendezvous_backend.py", line 64 in init File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/elastic/rendezvous/c10d_rendezvous_backend.py", line 259 in create_backend File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/elastic/rendezvous/registry.py", line 36 in _create_c10d_handler File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/elastic/rendezvous/api.py", line 263 in create_handler File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/elastic/rendezvous/registry.py", line 66 in get_rendezvous_handler File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 235 in launch_agent File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 132 in call File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/run.py", line 870 in run File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/run.py", line 879 in main File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347 in wrapper File "/root/miniconda3/bin/torchrun", line 8 in

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special (total: 20) Segmentation fault (core dumped)

JThh commented 3 weeks ago

Can you try the following commands, and send me your error traces (if any)?

gdb python3
r -c "import torch"
bt
supercooledith commented 3 weeks ago

您好,这个报错应该是环境问题,我们在潞晨云https://cloud.luchentech.com 上提供了Open Sora1.1镜像,欢迎您使用🫶🏻https://cloud.luchentech.com/doc/docs/image/Open-Sora%201.1

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open for 7 days with no activity.

zhengzangw commented 2 weeks ago

潞晨云更新了 OpenSora 1.2 的镜像,请使用 1.2 尝试一下:[潞晨云|OpenSora镜像|视频教程]