pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.22k stars 6.95k forks source link

segmentation fault when importing torchvision #8236

Closed Romeo-CC closed 9 months ago

Romeo-CC commented 9 months ago

🐛 Describe the bug

Get Segment Fault when import torchvision

Platform:

Macbook Pro 2018 13.3' with macOS 14.3

Pytorch Version

2.1.2

Torchvision Version:

0.16.2

How to Reproduce

input below in shell terminal

python -c 'import torchvision'

then the output is

zsh: segmentation fault  python -c 'import torchvision'

Versions

PyTorch version: 2.1.2 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 14.3 (x86_64) GCC version: Could not collect Clang version: 15.0.0 (clang-1500.1.0.2.5) CMake version: version 3.28.1 Libc version: N/A

Python version: 3.11.7 (main, Dec 15 2023, 12:09:04) [Clang 14.0.6 ] (64-bit runtime) Python platform: macOS-10.16-x86_64-i386-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz

Versions of relevant libraries: [pip3] numpy==1.26.3 [pip3] torch==2.1.2 [pip3] torchaudio==2.1.2 [pip3] torchdata==0.7.1 [pip3] torchtext==0.16.2 [pip3] torchvision==0.16.2 [conda] blas 1.0 mkl https://repo.anaconda.com/pkgs/main [conda] mkl 2023.1.0 h8e150cf_43560 https://repo.anaconda.com/pkgs/main [conda] mkl-service 2.4.0 py311h6c40b1e_1 https://repo.anaconda.com/pkgs/main [conda] mkl_fft 1.3.8 py311h6c40b1e_0 https://repo.anaconda.com/pkgs/main [conda] mkl_random 1.2.4 py311ha357a0b_0 https://repo.anaconda.com/pkgs/main [conda] numpy 1.26.3 py311h728a8a3_0 https://repo.anaconda.com/pkgs/main [conda] numpy-base 1.26.3 py311h53bf9ac_0 https://repo.anaconda.com/pkgs/main [conda] torch 2.1.2 pypi_0 pypi [conda] torchaudio 2.1.2 pypi_0 pypi [conda] torchdata 0.7.1 pypi_0 pypi [conda] torchtext 0.16.2 pypi_0 pypi [conda] torchvision 0.16.2 pypi_0 pypi

Romeo-CC commented 9 months ago

Same behavior with linux version

PyTorch version: 2.1.2+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 23.10 (x86_64) GCC version: (Ubuntu 13.2.0-4ubuntu3) 13.2.0 Clang version: 16.0.6 (15) CMake version: Could not collect Libc version: glibc-2.38

Python version: 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-6.5.0-15-generic-x86_64-with-glibc2.38 Is CUDA available: True CUDA runtime version: 12.3.107 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Ti Nvidia driver version: 545.29.02 cuDNN version: Probably one of the following: /usr/local/cuda-12.3/targets/x86_64-linux/lib/libcudnn.so.8.9.7 /usr/local/cuda-12.3/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.7 /usr/local/cuda-12.3/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.7 /usr/local/cuda-12.3/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.7 /usr/local/cuda-12.3/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.7 /usr/local/cuda-12.3/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.7 /usr/local/cuda-12.3/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.7 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz CPU family: 6 Model: 158 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 Stepping: 10 CPU(s) scaling MHz: 72% CPU max MHz: 4700.0000 CPU min MHz: 800.0000 BogoMIPS: 7399.70 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi md_clear flush_l1d arch_capabilities Virtualization: VT-x L1d cache: 192 KiB (6 instances) L1i cache: 192 KiB (6 instances) L2 cache: 1.5 MiB (6 instances) L3 cache: 12 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-11 Vulnerability Gather data sampling: Mitigation; Microcode Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Meltdown: Mitigation; PTI Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Retbleed: Mitigation; IBRS Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Mitigation; Microcode Vulnerability Tsx async abort: Mitigation; TSX disabled

Versions of relevant libraries: [pip3] mypy-extensions==1.0.0 [pip3] numpy==1.26.3 [pip3] torch==2.1.2+cu121 [pip3] torchaudio==2.1.2+cu121 [pip3] torchdata==0.7.1 [pip3] torchtext==0.16.2 [pip3] torchvision==0.16.2+cu121 [pip3] triton==2.1.0 [conda] blas 1.0 mkl defaults [conda] mkl 2023.1.0 h213fc3f_46344 defaults [conda] mkl-service 2.4.0 py311h5eee18b_1 defaults [conda] mkl_fft 1.3.8 py311h5eee18b_0 defaults [conda] mkl_random 1.2.4 py311hdb19cb5_0 defaults [conda] numpy 1.26.3 py311h08b1b3b_0 defaults [conda] numpy-base 1.26.3 py311hf175353_0 defaults [conda] torch 2.1.2+cu121 pypi_0 pypi [conda] torchaudio 2.1.2+cu121 pypi_0 pypi [conda] torchdata 0.7.1 pypi_0 pypi [conda] torchtext 0.16.2 pypi_0 pypi [conda] torchvision 0.16.2+cu121 pypi_0 pypi [conda] triton 2.1.0 pypi_0 pypi

NicolasHug commented 9 months ago

Thanks for the report @Romeo-CC - can you provide more details on how you created your environment and how both torch and torchvision were installed?

Romeo-CC commented 9 months ago

Thanks for the report @Romeo-CC - can you provide more details on how you created your environment and how both torch and torchvision were installed?

I installed cuda-12.3 and cudnn 8.9.7 manually (and set the environment variables).

Then I downloaded the binary wheel files from https://download.pytorch.org/whl/cu121 and installed via pip

torch-2.1.2+cu121-cp311-cp311-linux_x86_64.whl

torchvision-0.16.2+cu121-cp311-cp311-linux_x86_64.whl

NicolasHug commented 9 months ago

Thanks for the details. There's likely a build incompatiblity between these binaries, although I wouldn't know how to figure that out. Since you're using pip, have you tried relying on the official installation instructions https://pytorch.org/get-started/locally/ ? It's probably safer to rely on pip (or conda) to figure out exactly which binary you need, rather than installing those manually.

If really you want to downlaod/install manually, first figure out exactly what binaries you need by running the official instructions and looking at the packages names?

Romeo-CC commented 9 months ago

Thanks for the details. There's likely a build incompatiblity between these binaries, although I wouldn't know how to figure that out. Since you're using pip, have you tried relying on the official installation instructions https://pytorch.org/get-started/locally/ ? It's probably safer to rely on pip (or conda) to figure out exactly which binary you need, rather than installing those manually.

If really you want to downlaod/install manually, first figure out exactly what binaries you need by running the official instructions and looking at the packages names?

Unfortunately, the access to pypi in the place where I live is extremely unreliable. Due to GFW (the State owning and running internet censorship and content filtering system), it's almost impossible to install torch or any other wheels (Too SLOW, 1KB/s downloading rate) if I directly install via pip following the official installation instructions. That's why I install manually.

Romeo-CC commented 9 months ago

@NicolasHug

And here are more details

python -c "import torchvision"

/home/XXX/miniconda3/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/XXX/miniconda3/lib/python3.11/site-packages/torchvision/image.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn(

Segmentation fault (core dumped)

NicolasHug commented 9 months ago

Ah, sorry to hear that. Are there the same restrictions when using conda instead of pip? Alternatively you could try installing the latest pytorch nightlies (which are on https://download.pytorch.org/whl/torch/) and then build torchvision from source?

Romeo-CC commented 9 months ago

Ah, sorry to hear that. Are there the same restrictions when using conda instead of pip? Alternatively you could try installing the latest pytorch nightlies (which are on https://download.pytorch.org/whl/torch/) and then build torchvision from source?

Tried with installing torch-2.1.2 and torchvision-0.16.2 via conda, and the segmentation fault is still there.

Then I installed latest nightly version i.e. torch-2.3.0.dev20240129+cu121 and torchvision-0.18.0..dev20240129+cu121 (download from https://download.pytorch.org/whl/nightly/) and both cudnn and cuda install via pip (from pypi mirror site)

The segmentation fault seems be resolved.

Romeo-CC commented 9 months ago

Sorry, My Fault. I'll close this.