p-lambda / jukemir

Perform transfer learning for MIR using Jukebox!
MIT License
172 stars 22 forks source link

RuntimeError: Failed to initialize NCCL #8

Closed KevinGoodman closed 1 year ago

KevinGoodman commented 1 year ago

image Thanks for sharing the code. In the provided colab, I got this error in model setup, can anyone help please?@rodrigo-castellon

I'm using CUDA 11.1 on RTX3090, and the pytorch version is 1.12.1

rodrigo-castellon commented 1 year ago

Hi, In the remote Colab environment this happens when you rerun the setup code within the same runtime, so when this happens I usually just restart the runtime environment and run everything through again.

If doing that doesn't work in your VSCode notebook, maybe you could try looking at some of the related issues on the Jukebox repository for some ideas? (issue 1, issue 2, issue 3).

Although this is not officially supported, you could also try using / modifying the Docker image for Jukebox representation extraction to run this code through, and that might fix these issues.

And, the obvious workaround is to use Colab itself, which should be working fine. But I understand that you may be interested in running this locally specifically.

Hope this helps!

KevinGoodman commented 1 year ago

Thanks for the quick reply. My bad for not searching thoroughly...

suremangood commented 6 months ago

I've also encountered this problem, does anyone have any advice?