pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
80.13k stars 21.54k forks source link

Importing PyTorch 2.2 fails with undefined symbol error: ncclCommRegister #119072

Open rosario-purple opened 5 months ago

rosario-purple commented 5 months ago

๐Ÿ› Describe the bug

When I upgrade to PyTorch 2.2 via Pip, importing torch fails with an undefined symbol error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
    from torch._C import *  # noqa: F403
ImportError: /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclCommRegister

Downgrading to Torch 2.1.2 fixed the problem. My best guess is that this is because I have MS-AMP installed (https://github.com/Azure/MS-AMP) which is pinned to an older version of NCCL (https://github.com/Azure/msccl-executor-nccl version 2.17.1), while PyTorch 2.2 depends on a newer version (NCCL 2.19.3).

Versions

Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.22.1 Libc version: glibc-2.35

Python version: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35 Is CUDA available: N/A CUDA runtime version: 12.1.105 CUDA_MODULE_LOADING set to: N/A GPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB GPU 1: NVIDIA A100-SXM4-80GB GPU 2: NVIDIA A100-SXM4-80GB GPU 3: NVIDIA A100-SXM4-80GB GPU 4: NVIDIA A100-SXM4-80GB GPU 5: NVIDIA A100-SXM4-80GB GPU 6: NVIDIA A100-SXM4-80GB GPU 7: NVIDIA A100-SXM4-80GB

Nvidia driver version: 545.23.08 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.3 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: N/A

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz CPU family: 6 Model: 106 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 2 Stepping: 6 BogoMIPS: 4000.04 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss h\ t syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdc\ m pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault inv\ pcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms i\ nvpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoin\ vd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid fsrm md_clear ar\ ch_capabilities Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full L1d cache: 3 MiB (96 instances) L1i cache: 3 MiB (96 instances) L2 cache: 192 MiB (48 instances) L3 cache: 32 MiB (2 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0-47 NUMA node1 CPU(s): 48-95 Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Mitigation; TSX disabled

Versions of relevant libraries: [pip3] flake8==6.1.0 [pip3] numpy==1.24.4 [pip3] numpyro==0.9.2 [pip3] torch==2.2.0 [pip3] torchaudio==2.2.0 [pip3] torchvision==0.17.0 [pip3] triton==2.2.0 [conda] numpy 1.24.4 pypi_0 pypi [conda] numpyro 0.9.2 pypi_0 pypi [conda] torch 2.2.0 pypi_0 pypi [conda] torchaudio 2.2.0 pypi_0 pypi [conda] torchvision 0.17.0 pypi_0 pypi [conda] triton 2.2.0 pypi_0 pypi

cc @seemethere @malfet @osalpekar @atalman

ruifengma commented 5 months ago

cuda 12.2 works for me with pytorch 2.2, same python 3.10.13

atalman commented 4 months ago

Works on Ubuntu 22.04 installed via docker pull ubuntu:22.04

torch install:

pip install torch
Collecting torch
  Downloading torch-2.2.0-cp310-cp310-manylinux1_x86_64.whl.metadata (25 kB)
Collecting filelock (from torch)
  Downloading filelock-3.13.1-py3-none-any.whl.metadata (2.8 kB)
Collecting typing-extensions>=4.8.0 (from torch)
  Downloading typing_extensions-4.9.0-py3-none-any.whl.metadata (3.0 kB)
Collecting sympy (from torch)
  Downloading sympy-1.12-py3-none-any.whl (5.7 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 5.7/5.7 MB 88.1 MB/s eta 0:00:00
Collecting networkx (from torch)
  Downloading networkx-3.2.1-py3-none-any.whl.metadata (5.2 kB)
Collecting jinja2 (from torch)
  Downloading Jinja2-3.1.3-py3-none-any.whl.metadata (3.3 kB)
Collecting fsspec (from torch)
  Downloading fsspec-2024.2.0-py3-none-any.whl.metadata (6.8 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 23.7/23.7 MB 82.9 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 823.6/823.6 kB 60.2 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 14.1/14.1 MB 123.7 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 410.6/410.6 MB 10.1 MB/s eta 0:00:00
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 121.6/121.6 MB 31.5 MB/s eta 0:00:00
Collecting nvidia-curand-cu12==10.3.2.106 (from torch)
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 56.5/56.5 MB 57.2 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch)
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 124.2/124.2 MB 30.5 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch)
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 196.0/196.0 MB 20.2 MB/s eta 0:00:00
Collecting nvidia-nccl-cu12==2.19.3 (from torch)
  Downloading nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-nvtx-cu12==12.1.105 (from torch)
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 99.1/99.1 kB 8.5 MB/s eta 0:00:00
Collecting triton==2.2.0 (from torch)
  Downloading triton-2.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch)
  Downloading nvidia_nvjitlink_cu12-12.3.101-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting MarkupSafe>=2.0 (from jinja2->torch)
  Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting mpmath>=0.19 (from sympy->torch)
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 536.2/536.2 kB 47.2 MB/s eta 0:00:00
Downloading torch-2.2.0-cp310-cp310-manylinux1_x86_64.whl (755.5 MB)
   โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 755.5/755.5 MB 3.9 MB/s eta 0:00:00
Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
   โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 731.7/731.7 MB 4.1 MB/s eta 0:00:00
Downloading nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB)
   โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 166.0/166.0 MB 25.0 MB/s eta 0:00:00
Downloading triton-2.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (167.9 MB)
   โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 167.9/167.9 MB 23.8 MB/s eta 0:00:00
Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB)
Downloading filelock-3.13.1-py3-none-any.whl (11 kB)
Downloading fsspec-2024.2.0-py3-none-any.whl (170 kB)
   โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 170.9/170.9 kB 16.7 MB/s eta 0:00:00
Downloading Jinja2-3.1.3-py3-none-any.whl (133 kB)
   โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 133.2/133.2 kB 13.9 MB/s eta 0:00:00
Downloading networkx-3.2.1-py3-none-any.whl (1.6 MB)
   โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 1.6/1.6 MB 97.7 MB/s eta 0:00:00
Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Downloading nvidia_nvjitlink_cu12-12.3.101-py3-none-manylinux1_x86_64.whl (20.5 MB)
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.2.0+cu121'
ldd libtorch_cuda.so 
        linux-vdso.so.1 (0x00007ffc415fa000)
        libc10_cuda.so => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./libc10_cuda.so (0x00007f6d4a04e000)
        libcudart.so.12 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cuda_runtime/lib/libcudart.so.12 (0x00007f6d49c00000)
        libcusparse.so.12 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cusparse/lib/libcusparse.so.12 (0x00007f6d39c00000)
        libcufft.so.11 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cufft/lib/libcufft.so.11 (0x00007f6d2e000000)
        libcusparseLt-f8b4a9fb.so.0 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./libcusparseLt-f8b4a9fb.so.0 (0x00007f6d2bc00000)
        libnvToolsExt.so.1 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/nvtx/lib/libnvToolsExt.so.1 (0x00007f6d2b800000)
        libcurand.so.10 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/curand/lib/libcurand.so.10 (0x00007f6d25200000)
        libcublas.so.12 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cublas/lib/libcublas.so.12 (0x00007f6d1e800000)
        libcublasLt.so.12 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cublas/lib/libcublasLt.so.12 (0x00007f6cfc800000)
        libcudnn.so.8 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cudnn/lib/libcudnn.so.8 (0x00007f6cfc400000)
        libnccl.so.2 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/nccl/lib/libnccl.so.2 (0x00007f6cefa00000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f6d4a043000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f6d4a03c000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6d4a037000)
        libc10.so => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./libc10.so (0x00007f6d49f39000)
        libtorch_cpu.so => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./libtorch_cpu.so (0x00007f6cd85d1000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6d49b19000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6cd83a7000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6d49f19000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6cd817f000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f6d7c6f6000)
        libnvJitLink.so.12 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cusparse/lib/../../nvjitlink/lib/libnvJitLink.so.12 (0x00007f6cd4c00000)
        libgomp-a34b3233.so.1 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./libgomp-a34b3233.so.1 (0x00007f6cd4800000)
        libcupti.so.12 => /root/miniconda3/lib/python3.10/site-packages/torch/lib/./../../nvidia/cuda_cupti/lib/libcupti.so.12 (0x00007f6cd3e00000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f6d49f10000)

@rosario-purple could you please run ldd /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so on the machine you are seeing this issue ?

malfet commented 4 months ago

I guess there isn't much one can do here other than mark torch incompatible with msccl-executor-nccl One can override it with LD_LIBRARY_PATH though

Other solution would be do update bundled nccl binaries inside msccl-executor-nccl with the ones shipped with PyTorch (not sure it will work, but perhaps worth trying as NCCL should forward compatible)

atalman commented 4 months ago

cc @ptrblck

Aidyn-A commented 4 months ago

My best guess is that this is because I have MS-AMP installed (https://github.com/Azure/MS-AMP) which is pinned to an older version of NCCL (https://github.com/Azure/msccl-executor-nccl version 2.17.1), while PyTorch 2.2 depends on a newer version (NCCL 2.19.3).

That is the exact reason why it fails with an undefined symbol. ncclCommRegister was introduced in NCCL v2.19, and is being utilized in PyTorch since November (https://github.com/pytorch/pytorch/commit/ab1f6d58bc57faa89b74b98a27fc38e90abf8520).

lucasjinreal commented 4 months ago

Yes, this should be addressed to as many users as possible. From I can see, it will breaks all torch import when nccl under 2.19 which actually still commonly used.

Also, since lately nccl actually had a bug with torch do training parallel, one solution is upgrade nccl, users might upgraded nccl but still linked wrongly.

Please make a guide for users to resolve issues relate to nccl, thank u!

rosario-purple commented 4 months ago

@atalman Sure here's the output

(brr) alyssavance@7e72bd4e-01:/scratch/brr$ ldd /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so
        linux-vdso.so.1 (0x00007ffc5f7c1000)
        libc10_cuda.so => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/libc10_cuda.so (0x00001541cbd1a000)
        libcudart.so.12 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.12 (0x00001541cba00000)
        libcusparse.so.12 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12 (0x00001541bba00000)
        libcurand.so.10 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/curand/lib/libcurand.so.10 (0x00001541b5400000)
        libcufft.so.11 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cufft/lib/libcufft.so.11 (0x00001541a9800000)
        libnvToolsExt.so.1 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/nvtx/lib/libnvToolsExt.so.1 (0x00001541a9400000)
        libcudnn.so.8 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.8 (0x00001541a9000000)
        libnccl.so.2 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/nccl/lib/libnccl.so.2 (0x0000154198e00000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00001541cbd07000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00001541cbd02000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00001541cbcfd000)
        libc10.so => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/libc10.so (0x00001541cb922000)
        libtorch_cpu.so => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so (0x0000154181ee8000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00001541bb919000)
        libcublas.so.12 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.12 (0x000015417b600000)
        libcublasLt.so.12 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcublasLt.so.12 (0x0000154159600000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00001541593d4000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00001541cbcd9000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00001541591ab000)
        /lib64/ld-linux-x86-64.so.2 (0x00001541f765b000)
        libnvJitLink.so.12 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/../../nvjitlink/lib/libnvJitLink.so.12 (0x0000154156000000)
        libgomp-a34b3233.so.1 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/libgomp-a34b3233.so.1 (0x0000154155c00000)
        libcupti.so.12 => /scratch/miniconda3/envs/brr/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_cupti/lib/libcupti.so.12 (0x0000154155200000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00001541cbcd2000)
mvsjober commented 4 months ago

This is highly problematic as NVIDIA provide NCCL 2.19 rpms for RHEL8 only for CUDA 12.2 and above, while PyTorch binaries are for CUDA 12.1: https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/

ptrblck commented 4 months ago

This is highly problematic as NVIDIA provide NCCL 2.19 rpms for RHEL8 only for CUDA 12.2 and above, while PyTorch binaries are for CUDA 12.1

NCCL uses CUDA 12.2 to build its binaries and statically links the CUDART to them. This is a common approach and will not cause any incompatibilities. In PyTorch we are depending on the NCCL PyPI wheel using the same toolchain. Could you explain why it's highly problematic?

mvsjober commented 4 months ago

This is highly problematic as NVIDIA provide NCCL 2.19 rpms for RHEL8 only for CUDA 12.2 and above, while PyTorch binaries are for CUDA 12.1

NCCL uses CUDA 12.2 to build its binaries and statically links the CUDART to them. This is a common approach and will not cause any incompatibilities. In PyTorch we are depending on the NCCL PyPI wheel using the same toolchain. Could you explain why it's highly problematic?

I'm building a container with PyTorch and I've always kept the CUDA rpms to be the same version as the one PyTorch binaries have been linked against. I just assumed it would cause problems if PyTorch itself is linked against different CUDA version.

In fact I had some problems after switching to CUDA 12.2, but now it turns out this was an unrelated thing. So maybe it will work...

lucasjinreal commented 4 months ago

this issue not happen usually because of torch linked cuda the system one can also handle, but when comes to cuda12.2 some function may not found but torch used it.

dominicklee commented 3 months ago

Hello all, I had the same problem myself. I am posting this to hopefully help anyone with a similar issue. For context, I'm running an Nvidia 4070 Ti Super GPU on my Windows workstation PC which has CUDA 12.4. This is supposed to be the latest installation. I'm using Ubuntu 22.04 as well, so I am running in WSL2. Now, the problem was that I've tried pip uninstalling and reinstalling PyTorch to no avail. Every time I try running PyTorch in Python, I would get this error:

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
    from torch._C import *  # noqa: F403
ImportError: /home/user/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclCommRegister

I am aware that at the moment, PyTorch was built for CUDA 12.1, but I've got it to work after some hours of troubleshooting. Here is what ultimately worked for me:

  1. First, uninstall all the PyTorch packages using pip. Do the same with and without the sudo command:
    sudo pip3 uninstall -y torch torchvision torchaudio
    pip3 uninstall -y torch torchvision torchaudio
    pip3 cache purge
  2. Install nccl (Nvidia Collective Communications lib) for CUDA 12.4. Basically, its NCCL 2.20.5 which was released on March 5th, 2024. You can find it on the Nvidia website as follows: https://developer.nvidia.com/nccl/nccl-download. Run the commands for the Network Install.
  3. Next, you'll need to install Nvidia cuDNN. Even if you think you have it, do the steps again. You can go to Nvidia's cuDNN download page for instructions.
  4. Finally, the last but most important step is to reinstall PyTorch. Except use the nightly build so that we get the latest version:
    pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

At the time of writing, I am running on CUDA 12.4 with PyTorch working now. Here's what it might look like:

import torch
import torchvision
import torchaudio
print(torch.__version__)
print(torchvision.__version__)
print(torchaudio.__version__)
print(torch.cuda.is_available())

Output:

2.4.0.dev20240326+cu121
0.19.0.dev20240327+cu121
2.2.0.dev20240327+cu121
True

Wishing everyone the best! And hopefully PyTorch would provide a stable version for CUDA 12.4 users. Happy coding.

harrishyp commented 3 months ago

ๅคงๅฎถๅฅฝ๏ผŒๆˆ‘่‡ชๅทฑไนŸ้‡ๅˆฐไบ†ๅŒๆ ท็š„้—ฎ้ข˜ใ€‚ๆˆ‘ๅ‘ๅธƒๆญคๅ†…ๅฎนๆ˜ฏไธบไบ†ๅธŒๆœ›ๅฏน้‡ๅˆฐ็ฑปไผผ้—ฎ้ข˜็š„ไบบๆœ‰ๆ‰€ๅธฎๅŠฉใ€‚ไฝœไธบไธŠไธ‹ๆ–‡๏ผŒๆˆ‘ๅœจๅ…ทๆœ‰ CUDA 12.4 ็š„ Windows ๅทฅไฝœ็ซ™ PC ไธŠ่ฟ่กŒ Nvidia 4070 Ti Super GPUใ€‚่ฟ™ๅบ”่ฏฅๆ˜ฏๆœ€ๆ–ฐ็š„ๅฎ‰่ฃ…ใ€‚ๆˆ‘ไนŸไฝฟ็”จ Ubuntu 22.04๏ผŒๆ‰€ไปฅๆˆ‘ๅœจ WSL2 ไธญ่ฟ่กŒใ€‚็Žฐๅœจ๏ผŒ้—ฎ้ข˜ๆ˜ฏๆˆ‘ๅฐ่ฏ• pip ๅธ่ฝฝๅนถ้‡ๆ–ฐๅฎ‰่ฃ… PyTorch ๆ— ๆตŽไบŽไบ‹ใ€‚ๆฏๆฌกๆˆ‘ๅฐ่ฏ•ๅœจ Python ไธญ่ฟ่กŒ PyTorch ๆ—ถ๏ผŒ้ƒฝไผšๆ”ถๅˆฐๆญค้”™่ฏฏ๏ผš

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
    from torch._C import *  # noqa: F403
ImportError: /home/user/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclCommRegister

ๆˆ‘็Ÿฅ้“็›ฎๅ‰ PyTorch ๆ˜ฏไธบ CUDA 12.1 ๆž„ๅปบ็š„๏ผŒไฝ†็ป่ฟ‡ๅ‡ ไธชๅฐๆ—ถ็š„ๆ•…้šœๆŽ’้™คๅŽๆˆ‘ๅทฒ็ป่ฎฉๅฎƒๅฏไปฅๅทฅไฝœไบ†ใ€‚่ฟ™ๆœ€็ปˆๅฏนๆˆ‘ๆœ‰็”จ๏ผš

  1. ้ฆ–ๅ…ˆ๏ผŒไฝฟ็”จ pip ๅธ่ฝฝๆ‰€ๆœ‰ PyTorch ่ฝฏไปถๅŒ…ใ€‚ไฝฟ็”จๅ’Œไธไฝฟ็”จsudoๅ‘ฝไปค้ƒฝๆ‰ง่กŒ็›ธๅŒ็š„ๆ“ไฝœ๏ผš
sudo pip3 uninstall -y torch torchvision torchaudio
pip3 uninstall -y torch torchvision torchaudio
pip3 cache purge
  1. ไธบ CUDA 12.4 ๅฎ‰่ฃ… nccl๏ผˆNvidia Collective Communications lib๏ผ‰ใ€‚ๅŸบๆœฌไธŠ๏ผŒๅฎƒ็š„ NCCL 2.20.5 ไบŽ 2024 ๅนด 3 ๆœˆ 5 ๆ—ฅๅ‘ๅธƒใ€‚ๆ‚จๅฏไปฅๅœจ Nvidia ็ฝ‘็ซ™ไธŠๆ‰พๅˆฐๅฎƒ๏ผŒๅฆ‚ไธ‹ๆ‰€็คบ๏ผš https: //developer.nvidia.com/nccl/nccl-downloadใ€‚่ฟ่กŒ็ฝ‘็ปœๅฎ‰่ฃ…ๅ‘ฝไปคใ€‚
  2. ๆŽฅไธ‹ๆฅ๏ผŒๆ‚จ้œ€่ฆๅฎ‰่ฃ… Nvidia cuDNNใ€‚ๅณไฝฟๆ‚จ่ฎคไธบ่‡ชๅทฑๅทฒ็ปๆŽŒๆกไบ†๏ผŒไนŸ่ฏทๅ†ๆฌกๆ‰ง่กŒ่ฟ™ไบ›ๆญฅ้ชคใ€‚ๆ‚จๅฏไปฅๅ‰ๅพ€Nvidia ็š„ cuDNN ไธ‹่ฝฝ้กต้ข่Žทๅ–่ฏดๆ˜Žใ€‚
  3. ๆœ€ๅŽ๏ผŒๆœ€ๅŽไฝ†ๆœ€้‡่ฆ็š„ไธ€ๆญฅๆ˜ฏ้‡ๆ–ฐๅฎ‰่ฃ… PyTorchใ€‚้™คไบ†ไฝฟ็”จๅคœ้—ดๆž„ๅปบ๏ผŒไปฅไพฟๆˆ‘ไปฌ่Žทๅพ—ๆœ€ๆ–ฐ็‰ˆๆœฌ๏ผš
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

ๅœจๆ’ฐๅ†™ๆœฌๆ–‡ๆ—ถ๏ผŒๆˆ‘ๆญฃๅœจ CUDA 12.4 ไธŠ่ฟ่กŒ๏ผŒPyTorch ๆญฃๅœจ่ฟ่กŒใ€‚ๅฎƒๅฏ่ƒฝๅฆ‚ไธ‹ๆ‰€็คบ๏ผš

import torch
import torchvision
import torchaudio
print(torch.__version__)
print(torchvision.__version__)
print(torchaudio.__version__)
print(torch.cuda.is_available())

่พ“ๅ‡บ๏ผš

2.4.0.dev20240326+cu121
0.19.0.dev20240327+cu121
2.2.0.dev20240327+cu121
True

็ฅๅคงๅฎถไธ€ๅˆ‡้กบๅˆฉ๏ผๅธŒๆœ› PyTorch ่ƒฝไธบ CUDA 12.4 ็”จๆˆทๆไพ›็จณๅฎš็‰ˆๆœฌใ€‚ๅฟซไน็ผ–็ ใ€‚

I encountered the same problem and successfully used the method you provided. Thank you

VictorNanka commented 2 months ago

Hello all, I had the same problem myself. I am posting this to hopefully help anyone with a similar issue. For context, I'm running an Nvidia 4070 Ti Super GPU on my Windows workstation PC which has CUDA 12.4. This is supposed to be the latest installation. I'm using Ubuntu 22.04 as well, so I am running in WSL2. Now, the problem was that I've tried pip uninstalling and reinstalling PyTorch to no avail. Every time I try running PyTorch in Python, I would get this error:

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
    from torch._C import *  # noqa: F403
ImportError: /home/user/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclCommRegister

I am aware that at the moment, PyTorch was built for CUDA 12.1, but I've got it to work after some hours of troubleshooting. Here is what ultimately worked for me:

1. First, uninstall all the PyTorch packages using pip. Do the same with and without the `sudo` command:
sudo pip3 uninstall -y torch torchvision torchaudio
pip3 uninstall -y torch torchvision torchaudio
pip3 cache purge
2. Install nccl (Nvidia Collective Communications lib) for CUDA 12.4. Basically, its NCCL 2.20.5 which was released on March 5th, 2024. You can find it on the Nvidia website as follows: https://developer.nvidia.com/nccl/nccl-download. Run the commands for the Network Install.

3. Next, you'll need to install Nvidia cuDNN. Even if you think you have it, do the steps again. You can go to [Nvidia's cuDNN download page](https://developer.nvidia.com/cudnn-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network) for instructions.

4. Finally, the last but most important step is to reinstall PyTorch. Except use the nightly build so that we get the latest version:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

At the time of writing, I am running on CUDA 12.4 with PyTorch working now. Here's what it might look like:

import torch
import torchvision
import torchaudio
print(torch.__version__)
print(torchvision.__version__)
print(torchaudio.__version__)
print(torch.cuda.is_available())

Output:

2.4.0.dev20240326+cu121
0.19.0.dev20240327+cu121
2.2.0.dev20240327+cu121
True

Wishing everyone the best! And hopefully PyTorch would provide a stable version for CUDA 12.4 users. Happy coding.

Thanks for your contribution, it works