pytorch / functorch

functorch is JAX-like composable function transforms for PyTorch.
https://pytorch.org/functorch/
BSD 3-Clause "New" or "Revised" License
1.38k stars 102 forks source link

installation issue #1062

Open Tsingularity opened 1 year ago

Tsingularity commented 1 year ago

Hi, thanks for the great library!

I met the following issue when trying to install functorch:

  1. in the doc, it says functorch is already included in pytorch 1.13, but after i install pytorch-nightly, it still tells me no module name functorch.
  2. so i install with pip, but after installation, it will give me the following userwarning when running my code. looks like torchvision is broken? i saw another issue had similar error message but on torchaudio.
    anaconda3/envs/nv/lib/python3.9/site-packages/torchvision/io/image.py:13: 
    UserWarning: Failed to load image Python extension: /local/lumingt/anaconda3/envs/nv/lib/python3.9/site-packages/torchvision/image.so:
    undefined symbol: _ZN2at4_ops19empty_memory_format4callEN3c108ArrayRefIlEENS2_8optionalINS2_10ScalarTypeEEENS5_INS2_6LayoutEEENS5_INS2_6DeviceEEENS5_IbEENS5_INS2_12MemoryFormatEEE
    warn(f"Failed to load image Python extension: {e}")

    could you please take a look at this and see what should i do now? thanks!

zou3519 commented 1 year ago

What version of torch do you have installed? If you could run the following script and show the output then that would be helpful to diagnose

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
Tsingularity commented 1 year ago

thanks for help!

here're the outputs:

Collecting environment information...                                                                                                                                               [3/1048]PyTorch version: 1.13.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.24.3
Libc version: glibc-2.27

Python version: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21)  [GCC 10.3.0] (64-bit runtime)
Python platform: Linux-4.15.0-189-generic-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: 11.3.109
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA RTX A6000
Nvidia driver version: 515.76
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] functorch==1.13.0
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.4
[pip3] pytorch-ranger==0.1.1
[pip3] torch==1.13.0
[pip3] torch-optimizer==0.3.0
[pip3] torchvision==0.13.1
[conda] blas                      2.116                       mkl    conda-forge
[conda] blas-devel                3.9.0            16_linux64_mkl    conda-forge
[conda] cudatoolkit               11.6.0              hecad31d_10    conda-forge
[conda] cudatoolkit-dev           11.3.1           py39h3811e60_0    conda-forge
[conda] functorch                 1.13.0                   pypi_0    pypi
[conda] libblas                   3.9.0            16_linux64_mkl    conda-forge
[conda] libcblas                  3.9.0            16_linux64_mkl    conda-forge
[conda] liblapack                 3.9.0            16_linux64_mkl    conda-forge
[conda] liblapacke                3.9.0            16_linux64_mkl    conda-forge
[conda] mkl                       2022.1.0           h84fe81f_915    conda-forge
[conda] mkl-devel                 2022.1.0           ha770c72_916    conda-forge
[conda] mkl-include               2022.1.0           h84fe81f_915    conda-forge
[conda] numpy                     1.23.4           py39h3d75532_1    conda-forge
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pytorch-ranger            0.1.1                    pypi_0    pypi
[conda] torch                     1.13.0                   pypi_0    pypi
[conda] torch-optimizer           0.3.0                    pypi_0    pypi
[conda] torchvision               0.13.1               py39_cu116    pytorch
Tsingularity commented 1 year ago

follow-up above:

after installing functorch with pip, looks like some of my custom cuda kernel also got broken, with a similar undefined symbol error message.

zou3519 commented 1 year ago

It looks like you have torch 1.13.0 and functorch 1.13.0. The version of torchvision that is compatible with those is 0.14.0, but it looks like you have 0.13.0 installed. I don't know what custom cuda kernels you're referring to, but if you are having undefined symbol errors, then that means that they don't work with torch 1.13.0.