modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.49k stars 688 forks source link

单机多卡训练,显存只分配到了0卡,其余卡没有显存占用 #615

Closed JianweiSun007 closed 1 year ago

JianweiSun007 commented 1 year ago

如题,使用FunASR/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/这个目录下的例子进行单机多卡训练 CUDA_VISIBLE_DEVICES=1,2 python -m torch.distributed.launch --nproc_per_node 2 --use_env finetune.py显存报错,然后调试发现显存都集中在了0卡,其余一张显卡未被利用,请问是什么原因导致的?

hnluo commented 1 year ago

Please ask your question in the following format OS: [e.g. linux] Python/C++ Version: Package Version:pytorch、torchaudio、modelscope、funasr version(pip list) Model: Command: Details: Error log:

apple2333cream commented 7 months ago

我也碰到这个问题了,换成torchrun的方式,也一样全部在0卡,有大佬成功解决过这个问题的吗?