ruizhecao96 / CMGAN

Conformer-based Metric GAN for speech enhancement
MIT License
309 stars 60 forks source link

RuntimeError: NCCL error in: /……/.cpp:957, invalid usage, NCCL version 21.0.3 #25

Closed LiuBurger closed 1 year ago

LiuBurger commented 1 year ago

RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1634272204863/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3

utyanndesu commented 1 year ago

I have also encountered this problem. Have you resolved it?

LiuBurger commented 1 year ago

Yes I have,but by an easy method:using DataParallel instead of DistributedDataParallel

------------------ 原始邮件 ------------------ 发件人: "ruizhecao96/CMGAN" @.>; 发送时间: 2023年4月18日(星期二) 晚上10:32 @.>; @.**@.>; 主题: Re: [ruizhecao96/CMGAN] RuntimeError: NCCL error in: /……/.cpp:957, invalid usage, NCCL version 21.0.3 (Issue #25)

I have also encountered this problem. Have you resolved it?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

utyanndesu commented 1 year ago

Thank you!But after this replacement, new errors appeared again

LiuBurger commented 1 year ago

You could use the code which is before the “Feature Multiple GPU Training” update,and try to modify it by using DataParallel

------------------ 原始邮件 ------------------ 发件人: "ruizhecao96/CMGAN" @.>; 发送时间: 2023年4月19日(星期三) 晚上6:26 @.>; @.**@.>; 主题: Re: [ruizhecao96/CMGAN] RuntimeError: NCCL error in: /……/.cpp:957, invalid usage, NCCL version 21.0.3 (Issue #25)

Thank you!But after this replacement, new errors appeared again

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

utyanndesu commented 1 year ago

Thank you very much for your suggestion, but as a beginner, I have encountered too many problems while modifying the code, which I have not been able to solve so far.