modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Apache License 2.0
1.07k stars 93 forks source link

运行bash run.sh 出错 #36

Closed von-321 closed 10 months ago

von-321 commented 10 months ago

我运行的是sv-cam++中的run.sh, 只用了一个GPU,到Stage3的时候报错,是python的问题吗?求赐教。 Stage3: Training the speaker model... /root/miniconda3/envs/3D-Speaker/bin/python: can't open file 'speakerlab/bin/train.py': [Errno 20] Not a directory ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 3209) of binary: /root/miniconda3/envs/3D-Speaker/bin/python Traceback (most recent call last): File "/root/miniconda3/envs/3D-Speaker/bin/torchrun", line 8, in sys.exit(main()) File "/root/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/root/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main run(args) File "/root/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/root/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/root/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

speakerlab/bin/train.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-11-02_16:34:10 host : autodl-container-9ee2119752-04687cb0 rank : 0 (local_rank: 0) exitcode : 2 (pid: 3209) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================
wanghuii1 commented 10 months ago

可以检查下在run.sh目录下speakerlab/bin/train.py文件是否存在

von-321 commented 10 months ago

可以检查下在 run.sh 目录下speakerlab/bin/train.py文件是否存在

检查过,这个文件是存在的;为了确保路径,我将run.sh中的speakerlab/bin/train.py路径换成了绝对路径,但是报错信息依然是这个,报错信息里显示的路径也不是绝对路径。

wanghuii1 commented 10 months ago

可以尝试1、检查py运行权限,保证它是可运行的。2、或者,run.sh目录下的speakerlab是一个软连接,可以尝试把指向路径的文件目录3D-Speaker/speakerlab复制并替换它

von-321 commented 10 months ago

可以尝试1、检查py运行权限,保证它是可运行的。2、或者,run.sh 目录下的speakerlab是一个软连接,可以尝试把指向路径的文件目录3D-Speaker/speakerlab复制并替换它 听取您的建议问题已经得到解决,非常感谢!!!