Importing PyTorch 2.2 fails with undefined symbol error: ncclCommRegister

rosario-purple commented 5 months ago

🐛 Describe the bug

When I upgrade to PyTorch 2.2 via Pip, importing torch fails with an undefined symbol error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
    from torch._C import *  # noqa: F403
ImportError: /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclCommRegister

Downgrading to Torch 2.1.2 fixed the problem. My best guess is that this is because I have MS-AMP installed (https://github.com/Azure/MS-AMP) which is pinned to an older version of NCCL (https://github.com/Azure/msccl-executor-nccl version 2.17.1), while PyTorch 2.2 depends on a newer version (NCCL 2.19.3).

Versions

Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.22.1 Libc version: glibc-2.35

Python version: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35 Is CUDA available: N/A CUDA runtime version: 12.1.105 CUDA_MODULE_LOADING set to: N/A GPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB GPU 1: NVIDIA A100-SXM4-80GB GPU 2: NVIDIA A100-SXM4-80GB GPU 3: NVIDIA A100-SXM4-80GB GPU 4: NVIDIA A100-SXM4-80GB GPU 5: NVIDIA A100-SXM4-80GB GPU 6: NVIDIA A100-SXM4-80GB GPU 7: NVIDIA A100-SXM4-80GB

Nvidia driver version: 545.23.08 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.3 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: N/A

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz CPU family: 6 Model: 106 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 2 Stepping: 6 BogoMIPS: 4000.04 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss h\ t syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdc\ m pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault inv\ pcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms i\ nvpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoin\ vd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid fsrm md_clear ar\ ch_capabilities Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full L1d cache: 3 MiB (96 instances) L1i cache: 3 MiB (96 instances) L2 cache: 192 MiB (48 instances) L3 cache: 32 MiB (2 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0-47 NUMA node1 CPU(s): 48-95 Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Mitigation; TSX disabled

Versions of relevant libraries: [pip3] flake8==6.1.0 [pip3] numpy==1.24.4 [pip3] numpyro==0.9.2 [pip3] torch==2.2.0 [pip3] torchaudio==2.2.0 [pip3] torchvision==0.17.0 [pip3] triton==2.2.0 [conda] numpy 1.24.4 pypi_0 pypi [conda] numpyro 0.9.2 pypi_0 pypi [conda] torch 2.2.0 pypi_0 pypi [conda] torchaudio 2.2.0 pypi_0 pypi [conda] torchvision 0.17.0 pypi_0 pypi [conda] triton 2.2.0 pypi_0 pypi

cc @seemethere @malfet @osalpekar @atalman

ruifengma commented 5 months ago

cuda 12.2 works for me with pytorch 2.2, same python 3.10.13

atalman commented 4 months ago

Works on Ubuntu 22.04 installed via docker pull ubuntu:22.04

torch install:

pip install torch
Collecting torch
  Downloading torch-2.2.0-cp310-cp310-manylinux1_x86_64.whl.metadata (25 kB)
Collecting filelock (from torch)
  Downloading filelock-3.13.1-py3-none-any.whl.metadata (2.8 kB)
Collecting typing-extensions>=4.8.0 (from torch)
  Downloading typing_extensions-4.9.0-py3-none-any.whl.metadata (3.0 kB)
Collecting sympy (from torch)
  Downloading sympy-1.12-py3-none-any.whl (5.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 88.1 MB/s eta 0:00:00
Collecting networkx (from torch)
  Downloading networkx-3.2.1-py3-none-any.whl.metadata (5.2 kB)
Collecting jinja2 (from torch)
  Downloading Jinja2-3.1.3-py3-none-any.whl.metadata (3.3 kB)
Collecting fsspec (from torch)
  Downloading fsspec-2024.2.0-py3-none-any.whl.metadata (6.8 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 82.9 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 60.2 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 123.7 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 10.1 MB/s eta 0:00:00
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 31.5 MB/s eta 0:00:00
Collecting nvidia-curand-cu12==10.3.2.106 (from torch)
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 57.2 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch)
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 30.5 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch)
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 20.2 MB/s eta 0:00:00
Collecting nvidia-nccl-cu12==2.19.3 (from torch)
  Downloading nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-nvtx-cu12==12.1.105 (from torch)
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 8.5 MB/s eta 0:00:00
Collecting triton==2.2.0 (from torch)
  Downloading triton-2.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch)
  Downloading nvidia_nvjitlink_cu12-12.3.101-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting MarkupSafe>=2.0 (from jinja2->torch)
  Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting mpmath>=0.19 (from sympy->torch)
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 47.2 MB/s eta 0:00:00
Downloading torch-2.2.0-cp310-cp310-manylinux1_x86_64.whl (755.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 755.5/755.5 MB 3.9 MB/s eta 0:00:00
Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 4.1 MB/s eta 0:00:00
Downloading nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.0/166.0 MB 25.0 MB/s eta 0:00:00
Downloading triton-2.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (167.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 167.9/167.9 MB 23.8 MB/s eta 0:00:00
Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB)
Downloading filelock-3.13.1-py3-none-any.whl (11 kB)
Downloading fsspec-2024.2.0-py3-none-any.whl (170 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 170.9/170.9 kB 16.7 MB/s eta 0:00:00
Downloading Jinja2-3.1.3-py3-none-any.whl (133 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.2/133.2 kB 13.9 MB/s eta 0:00:00
Downloading networkx-3.2.1-py3-none-any.whl (1.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 97.7 MB/s eta 0:00:00
Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Downloading nvidia_nvjitlink_cu12-12.3.101-py3-none-manylinux1_x86_64.whl (20.5 MB)

Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.2.0+cu121'

ldd libtorch_cuda.so 
        linux-vdso.so.1 (0x00007ffc415fa000)
        libc10_cuda.so => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./libc10_cuda.so (0x00007f6d4a04e000)
        libcudart.so.12 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cuda_runtime/lib/libcudart.so.12 (0x00007f6d49c00000)
        libcusparse.so.12 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cusparse/lib/libcusparse.so.12 (0x00007f6d39c00000)
        libcufft.so.11 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cufft/lib/libcufft.so.11 (0x00007f6d2e000000)
        libcusparseLt-f8b4a9fb.so.0 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./libcusparseLt-f8b4a9fb.so.0 (0x00007f6d2bc00000)
        libnvToolsExt.so.1 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/nvtx/lib/libnvToolsExt.so.1 (0x00007f6d2b800000)
        libcurand.so.10 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/curand/lib/libcurand.so.10 (0x00007f6d25200000)
        libcublas.so.12 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cublas/lib/libcublas.so.12 (0x00007f6d1e800000)
        libcublasLt.so.12 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cublas/lib/libcublasLt.so.12 (0x00007f6cfc800000)
        libcudnn.so.8 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cudnn/lib/libcudnn.so.8 (0x00007f6cfc400000)
        libnccl.so.2 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/nccl/lib/libnccl.so.2 (0x00007f6cefa00000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f6d4a043000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f6d4a03c000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6d4a037000)
        libc10.so => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./libc10.so (0x00007f6d49f39000)
        libtorch_cpu.so => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./libtorch_cpu.so (0x00007f6cd85d1000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6d49b19000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6cd83a7000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6d49f19000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6cd817f000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f6d7c6f6000)
        libnvJitLink.so.12 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cusparse/lib/../../nvjitlink/lib/libnvJitLink.so.12 (0x00007f6cd4c00000)
        libgomp-a34b3233.so.1 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./libgomp-a34b3233.so.1 (0x00007f6cd4800000)
        libcupti.so.12 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cuda_cupti/lib/libcupti.so.12 (0x00007f6cd3e00000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f6d49f10000)

@rosario-purple could you please run ldd /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so on the machine you are seeing this issue ?

malfet commented 4 months ago

I guess there isn't much one can do here other than mark torch incompatible with msccl-executor-nccl One can override it with LD_LIBRARY_PATH though

Other solution would be do update bundled nccl binaries inside msccl-executor-nccl with the ones shipped with PyTorch (not sure it will work, but perhaps worth trying as NCCL should forward compatible)

atalman commented 4 months ago

cc @ptrblck

Aidyn-A commented 4 months ago

My best guess is that this is because I have MS-AMP installed (https://github.com/Azure/MS-AMP) which is pinned to an older version of NCCL (https://github.com/Azure/msccl-executor-nccl version 2.17.1), while PyTorch 2.2 depends on a newer version (NCCL 2.19.3).

That is the exact reason why it fails with an undefined symbol. ncclCommRegister was introduced in NCCL v2.19, and is being utilized in PyTorch since November (https://github.com/pytorch/pytorch/commit/ab1f6d58bc57faa89b74b98a27fc38e90abf8520).

lucasjinreal commented 4 months ago

Yes, this should be addressed to as many users as possible. From I can see, it will breaks all torch import when nccl under 2.19 which actually still commonly used.

Also, since lately nccl actually had a bug with torch do training parallel, one solution is upgrade nccl, users might upgraded nccl but still linked wrongly.

Please make a guide for users to resolve issues relate to nccl, thank u!

rosario-purple commented 4 months ago

@atalman Sure here's the output

(brr) alyssavance@7e72bd4e-01:/scratch/brr$ ldd /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so
        linux-vdso.so.1 (0x00007ffc5f7c1000)
        libc10_cuda.so => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/libc10_cuda.so (0x00001541cbd1a000)
        libcudart.so.12 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.12 (0x00001541cba00000)
        libcusparse.so.12 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12 (0x00001541bba00000)
        libcurand.so.10 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/curand/lib/libcurand.so.10 (0x00001541b5400000)
        libcufft.so.11 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cufft/lib/libcufft.so.11 (0x00001541a9800000)
        libnvToolsExt.so.1 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/nvtx/lib/libnvToolsExt.so.1 (0x00001541a9400000)
        libcudnn.so.8 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.8 (0x00001541a9000000)
        libnccl.so.2 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/nccl/lib/libnccl.so.2 (0x0000154198e00000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00001541cbd07000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00001541cbd02000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00001541cbcfd000)
        libc10.so => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/libc10.so (0x00001541cb922000)
        libtorch_cpu.so => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so (0x0000154181ee8000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00001541bb919000)
        libcublas.so.12 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.12 (0x000015417b600000)
        libcublasLt.so.12 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcublasLt.so.12 (0x0000154159600000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00001541593d4000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00001541cbcd9000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00001541591ab000)
        /lib64/ld-linux-x86-64.so.2 (0x00001541f765b000)
        libnvJitLink.so.12 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/../../nvjitlink/lib/libnvJitLink.so.12 (0x0000154156000000)
        libgomp-a34b3233.so.1 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/libgomp-a34b3233.so.1 (0x0000154155c00000)
        libcupti.so.12 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_cupti/lib/libcupti.so.12 (0x0000154155200000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00001541cbcd2000)

mvsjober commented 4 months ago

This is highly problematic as NVIDIA provide NCCL 2.19 rpms for RHEL8 only for CUDA 12.2 and above, while PyTorch binaries are for CUDA 12.1: https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/

ptrblck commented 4 months ago

This is highly problematic as NVIDIA provide NCCL 2.19 rpms for RHEL8 only for CUDA 12.2 and above, while PyTorch binaries are for CUDA 12.1

NCCL uses CUDA 12.2 to build its binaries and statically links the CUDART to them. This is a common approach and will not cause any incompatibilities. In PyTorch we are depending on the NCCL PyPI wheel using the same toolchain. Could you explain why it's highly problematic?

mvsjober commented 4 months ago

This is highly problematic as NVIDIA provide NCCL 2.19 rpms for RHEL8 only for CUDA 12.2 and above, while PyTorch binaries are for CUDA 12.1

NCCL uses CUDA 12.2 to build its binaries and statically links the CUDART to them. This is a common approach and will not cause any incompatibilities. In PyTorch we are depending on the NCCL PyPI wheel using the same toolchain. Could you explain why it's highly problematic?

I'm building a container with PyTorch and I've always kept the CUDA rpms to be the same version as the one PyTorch binaries have been linked against. I just assumed it would cause problems if PyTorch itself is linked against different CUDA version.

In fact I had some problems after switching to CUDA 12.2, but now it turns out this was an unrelated thing. So maybe it will work...

lucasjinreal commented 4 months ago

this issue not happen usually because of torch linked cuda the system one can also handle, but when comes to cuda12.2 some function may not found but torch used it.

dominicklee commented 3 months ago

Hello all, I had the same problem myself. I am posting this to hopefully help anyone with a similar issue. For context, I'm running an Nvidia 4070 Ti Super GPU on my Windows workstation PC which has CUDA 12.4. This is supposed to be the latest installation. I'm using Ubuntu 22.04 as well, so I am running in WSL2. Now, the problem was that I've tried pip uninstalling and reinstalling PyTorch to no avail. Every time I try running PyTorch in Python, I would get this error:

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
    from torch._C import *  # noqa: F403
ImportError: /home/user/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclCommRegister

I am aware that at the moment, PyTorch was built for CUDA 12.1, but I've got it to work after some hours of troubleshooting. Here is what ultimately worked for me:

First, uninstall all the PyTorch packages using pip. Do the same with and without the sudo command:

sudo pip3 uninstall -y torch torchvision torchaudio
pip3 uninstall -y torch torchvision torchaudio
pip3 cache purge

Install nccl (Nvidia Collective Communications lib) for CUDA 12.4. Basically, its NCCL 2.20.5 which was released on March 5th, 2024. You can find it on the Nvidia website as follows: https://developer.nvidia.com/nccl/nccl-download. Run the commands for the Network Install.
Next, you'll need to install Nvidia cuDNN. Even if you think you have it, do the steps again. You can go to Nvidia's cuDNN download page for instructions.
Finally, the last but most important step is to reinstall PyTorch. Except use the nightly build so that we get the latest version:
```
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
```

At the time of writing, I am running on CUDA 12.4 with PyTorch working now. Here's what it might look like:

import torch
import torchvision
import torchaudio
print(torch.__version__)
print(torchvision.__version__)
print(torchaudio.__version__)
print(torch.cuda.is_available())

Output:

2.4.0.dev20240326+cu121
0.19.0.dev20240327+cu121
2.2.0.dev20240327+cu121
True

Wishing everyone the best! And hopefully PyTorch would provide a stable version for CUDA 12.4 users. Happy coding.

harrishyp commented 3 months ago

大家好，我自己也遇到了同样的问题。我发布此内容是为了希望对遇到类似问题的人有所帮助。作为上下文，我在具有 CUDA 12.4 的 Windows 工作站 PC 上运行 Nvidia 4070 Ti Super GPU。这应该是最新的安装。我也使用 Ubuntu 22.04，所以我在 WSL2 中运行。现在，问题是我尝试 pip 卸载并重新安装 PyTorch 无济于事。每次我尝试在 Python 中运行 PyTorch 时，都会收到此错误：
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
    from torch._C import *  # noqa: F403
ImportError: /home/user/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclCommRegister
我知道目前 PyTorch 是为 CUDA 12.1 构建的，但经过几个小时的故障排除后我已经让它可以工作了。这最终对我有用：

首先，使用 pip 卸载所有 PyTorch 软件包。使用和不使用sudo命令都执行相同的操作：
sudo pip3 uninstall -y torch torchvision torchaudio
pip3 uninstall -y torch torchvision torchaudio
pip3 cache purge
为 CUDA 12.4 安装 nccl（Nvidia Collective Communications lib）。基本上，它的 NCCL 2.20.5 于 2024 年 3 月 5 日发布。您可以在 Nvidia 网站上找到它，如下所示： https: //developer.nvidia.com/nccl/nccl-download。运行网络安装命令。

接下来，您需要安装 Nvidia cuDNN。即使您认为自己已经掌握了，也请再次执行这些步骤。您可以前往Nvidia 的 cuDNN 下载页面获取说明。

最后，最后但最重要的一步是重新安装 PyTorch。除了使用夜间构建，以便我们获得最新版本：
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
在撰写本文时，我正在 CUDA 12.4 上运行，PyTorch 正在运行。它可能如下所示：
import torch
import torchvision
import torchaudio
print(torch.__version__)
print(torchvision.__version__)
print(torchaudio.__version__)
print(torch.cuda.is_available())
输出：
2.4.0.dev20240326+cu121
0.19.0.dev20240327+cu121
2.2.0.dev20240327+cu121
True
祝大家一切顺利！希望 PyTorch 能为 CUDA 12.4 用户提供稳定版本。快乐编码。

I encountered the same problem and successfully used the method you provided. Thank you

VictorNanka commented 2 months ago

Hello all, I had the same problem myself. I am posting this to hopefully help anyone with a similar issue. For context, I'm running an Nvidia 4070 Ti Super GPU on my Windows workstation PC which has CUDA 12.4. This is supposed to be the latest installation. I'm using Ubuntu 22.04 as well, so I am running in WSL2. Now, the problem was that I've tried pip uninstalling and reinstalling PyTorch to no avail. Every time I try running PyTorch in Python, I would get this error:
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
    from torch._C import *  # noqa: F403
ImportError: /home/user/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclCommRegister
I am aware that at the moment, PyTorch was built for CUDA 12.1, but I've got it to work after some hours of troubleshooting. Here is what ultimately worked for me:
1. First, uninstall all the PyTorch packages using pip. Do the same with and without the `sudo` command:
sudo pip3 uninstall -y torch torchvision torchaudio
pip3 uninstall -y torch torchvision torchaudio
pip3 cache purge
2. Install nccl (Nvidia Collective Communications lib) for CUDA 12.4. Basically, its NCCL 2.20.5 which was released on March 5th, 2024. You can find it on the Nvidia website as follows: https://developer.nvidia.com/nccl/nccl-download. Run the commands for the Network Install.

3. Next, you'll need to install Nvidia cuDNN. Even if you think you have it, do the steps again. You can go to [Nvidia's cuDNN download page](https://developer.nvidia.com/cudnn-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network) for instructions.

4. Finally, the last but most important step is to reinstall PyTorch. Except use the nightly build so that we get the latest version:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
At the time of writing, I am running on CUDA 12.4 with PyTorch working now. Here's what it might look like:
import torch
import torchvision
import torchaudio
print(torch.__version__)
print(torchvision.__version__)
print(torchaudio.__version__)
print(torch.cuda.is_available())
Output:
2.4.0.dev20240326+cu121
0.19.0.dev20240327+cu121
2.2.0.dev20240327+cu121
True
Wishing everyone the best! And hopefully PyTorch would provide a stable version for CUDA 12.4 users. Happy coding.

Thanks for your contribution, it works

pytorch / pytorch

Importing PyTorch 2.2 fails with undefined symbol error: ncclCommRegister #119072

🐛 Describe the bug

Versions