hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
21.48k stars 2.06k forks source link

ERROR: Unexpected segmentation fault encountered in worker. #370

Closed howardgriffin closed 3 months ago

howardgriffin commented 3 months ago

When running the v1.1 training(using bucket), I encountered this error. Any suggestions?

Traceback (most recent call last): File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1133, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/root/miniconda3/envs/opensora/lib/python3.10/queue.py", line 180, in get self.not_empty.wait(remaining) File "/root/miniconda3/envs/opensora/lib/python3.10/threading.py", line 324, in wait gotit = waiter.acquire(True, timeout) File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 1871778) is killed by signal: Segmentation fault.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Open-Sora/scripts/train.py", line 330, in main() File "/Open-Sora/scripts/train.py", line 239, in main for step, batch in pbar: File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/tqdm/std.py", line 1169, in iter for obj in iterable: File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next data = self._next_data() File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1329, in _next_data idx, data = self._get_data() File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1285, in _get_data success, data = self._try_get_data() File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1146, in _try_get_data raise RuntimeError(f'DataLoader worker (pid(s) {pids_str}) exited unexpectedly') from e RuntimeError: DataLoader worker (pid(s) 1871778) exited unexpectedly

zhengzangw commented 3 months ago

Could you provide some rows in your csv file, and also the command you run? The csv must be processed so that the video has height, width, etc. information.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.