Open elkay opened 8 months ago
[pip3] torchvision==0.16.2+cu121
[conda] torchvision 0.16.2+cu121 pypi_0 pypi
Try uninstalling these versions first?
[pip3] torchvision==0.16.2+cu121 [conda] torchvision 0.16.2+cu121 pypi_0 pypi
Try uninstalling these versions first?
What would that accomplish? That's literally the package that I'm trying to use and that is throwing the error.
Built Torch 2.1.2 and TorchVision 2.1.2 from source
What version of torchvision are you building from source, exactly? There's no torchvision 2.x. The latest stable version is 0.17.
The fact that there already is a stable 0.16.2
version installed while you're trying to build from source is very likely to be causing some issues.
Built Torch 2.1.2 and TorchVision 2.1.2 from source
What version of torchvision are you building from source, exactly? There's no torchvision 2.x. The latest stable version is 0.17.
The fact that there already is a stable
0.16.2
version installed while you're trying to build from source is very likely to be causing some issues.
Updated original post, torchvision version was a typo.
I did finally get torchvision to build and be functional, but only by forcibly editing the build scripts to pull in my custom build of torch+cuda 2.1.2. The build scripts were importing a non-cuda build because there is no aarch64 torch+cuda out there for pip to pull down. So finally, after forcing my own torch+cuda 2.1.2 whl into the torchvision build, now my torchvision actually works.
I need to say - it's been PAINFUL dealing with building anything that relies on torch because all the build scripts pull down the non-cuda version and mess up the builds. Every time I want to build something relying on torch, now I need to hack in pulling my own torch whl instead for them to work (this also resolved issues I was having building a few other things).
I reaaaaaally hope official aarch64 torch+cuda builds start to be made available so I don't have to keep doing this hackjob.
What build script are you referring to? Can you share the build command you used?
The box is shut down but I believe it was pyproject.toml that I had to update to point directly at my torch whl and the command I used was "python setup.py bdist_wheel". I had the same outcomes with "pip install -v ." to directly install from source, though.
🐛 Describe the bug
Built Torch 2.1.2 and TorchVision 0.16.2 from source and running into the following problem:
/home/ec2-user/conda/envs/textgen/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/ec2-user/conda/envs/textgen/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZNK3c1017SymbolicShapeMeta18init_is_contiguousEv'If you don't plan on using image functionality from
torchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source?previously the error was about missing libs and not undefined symbol, so I believe the libs are correctly installed now. Building says:
So I believe I do have things set up correctly to be able to do image calls (I don't care about video). Any idea why I would still be getting the undefined symbol warning? Thanks!
Versions
Collecting environment information... PyTorch version: 2.1.2+cu121 Is debug build: False CUDA used to build PyTorch: 12.2 ROCM used to build PyTorch: N/A
OS: Amazon Linux 2023.3.20240304 (aarch64) GCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2) Clang version: Could not collect CMake version: version 3.28.3 Libc version: glibc-2.34
Python version: 3.10.9 (main, Mar 8 2023, 10:41:45) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-6.1.79-99.164.amzn2023.aarch64-aarch64-with-glibc2.34 Is CUDA available: True CUDA runtime version: 12.2.140 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA T4G Nvidia driver version: 550.54.14 cuDNN version: Probably one of the following: /usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn.so.8.9.4 /usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_adv_infer.so.8.9.4 /usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_adv_train.so.8.9.4 /usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_cnn_infer.so.8.9.4 /usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_cnn_train.so.8.9.4 /usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_ops_infer.so.8.9.4 /usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_ops_train.so.8.9.4 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture: aarch64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: ARM Model name: Neoverse-N1 Model: 1 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Stepping: r3p1 BogoMIPS: 243.75 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs L1d cache: 256 KiB (4 instances) L1i cache: 256 KiB (4 instances) L2 cache: 4 MiB (4 instances) L3 cache: 32 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-3 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; __user pointer sanitization Vulnerability Spectre v2: Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected
Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] torch==2.1.2+cu121 [pip3] torchaudio==2.1.2 [pip3] torchvision==0.16.2+cu121 [pip3] triton==2.1.0 [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.1.2+cu121 pypi_0 pypi [conda] torchaudio 2.1.2 pypi_0 pypi [conda] torchvision 0.16.2+cu121 pypi_0 pypi [conda] triton 2.1.0 pypi_0 pypi