pytorch / data

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
BSD 3-Clause "New" or "Revised" License
1.13k stars 152 forks source link

Cannot import torchdata when installed with conda on WSL #961

Open erip opened 1 year ago

erip commented 1 year ago

🐛 Describe the bug

$ conda install -c pytorch torchdata -y
$ python -c "import torchdata"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/erip/anaconda3/envs/tmp-dev/lib/python3.10/site-packages/torchdata/__init__.py", line 7, in <module>
    from torchdata import _extension  # noqa: F401
  File "/home/erip/anaconda3/envs/tmp-dev/lib/python3.10/site-packages/torchdata/_extension.py", line 34, in <module>
    _init_extension()
  File "/home/erip/anaconda3/envs/tmp-dev/lib/python3.10/site-packages/torchdata/_extension.py", line 31, in _init_extension
    from torchdata import _torchdata as _torchdata
ImportError: /usr/lib/x86_64-linux-gnu/libp11-kit.so.0: undefined symbol: ffi_type_pointer, version LIBFFI_BASE_7.0
$ conda uninstall torchdata -y && pip install torchdata
$ python -c "import torchdata"
$

Versions

PyTorch version: 1.13.1 Is debug build: False CUDA used to build PyTorch: 11.7 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.31

Python version: 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.15.79.1-microsoft-standard-WSL2-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: 11.7.99 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090 Nvidia driver version: 528.02 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] torch==1.13.1 [pip3] torchdata==0.5.1 [pip3] torchtext==0.14.1 [conda] blas 1.0 mkl [conda] mkl 2022.1.0 hc2b9512_224 [conda] pytorch 1.13.1 py3.10_cuda11.7_cudnn8.5.0_0 pytorch [conda] pytorch-cuda 11.7 h67b0de4_1 pytorch [conda] pytorch-mutex 1.0 cuda pytorch [conda] torchdata 0.5.1 pypi_0 pypi [conda] torchtext 0.14.1 py310 pytorch

ejguan commented 1 year ago

Try install libffi from conda?

erip commented 1 year ago

I did and it didn't help (no change in error). This is a clean install so it's seems like an issue with the conda packaging of torchdata. Note that the wheel is fine.

ejguan commented 1 year ago

Do you mind checking libffi version? If it's installed from conda, you should be able to find it via conda list

ejguan commented 1 year ago

When torchdata 0.5.1 compiled, it used libffi 3.4.2. It seems your environment is trying to use LIBFFI_BASE_7.0.

erip commented 1 year ago

Yes, here are the relevant details:

$ conda list | grep libffi
libffi                    3.4.2                h6a678d5_6
$ python -c "import torchdata"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/erip/anaconda3/envs/tmp-dev/lib/python3.10/site-packages/torchdata/__init__.py", line 7, in <module>
    from torchdata import _extension  # noqa: F401
  File "/home/erip/anaconda3/envs/tmp-dev/lib/python3.10/site-packages/torchdata/_extension.py", line 34, in <module>
    _init_extension()
  File "/home/erip/anaconda3/envs/tmp-dev/lib/python3.10/site-packages/torchdata/_extension.py", line 31, in _init_extension
    from torchdata import _torchdata as _torchdata
ImportError: /usr/lib/x86_64-linux-gnu/libp11-kit.so.0: undefined symbol: ffi_type_pointer, version LIBFFI_BASE_7.0

It seems like the right lib is coming along for the ride, but something is causing it to be linked improperly. I tried messing with LD_LIBRARY_PATH as well but it didn't seem to help.

ejguan commented 1 year ago

TBH, I don't know. How about try to use pip to re-install cffi as well?

erip commented 1 year ago

It doesn't seem to help --- with a force reinstall, cffi 1.15.1 gets (re)installed but with the same error. I can dig into this a bit more soon, but the workaround is to pip install torchdata in the short-term.

ejguan commented 1 year ago

Yeah. I think pip works simply because we statically link those packages to the wheel, which would prevent this scenario of finding wrong shared lib.

erip commented 1 year ago

One thing I intend to do is look in $CONDA_PREFIX to see if the right version of libffi is there. I'm not sure what the offending lib is from the error message (looks like a crypto lib?) so I'm not sure how it interacts with the search path either...

erip commented 1 year ago

After conda installing torchdata, I see the following:

$ find $CONDA_PREFIX -name "*libffi*" | xargs strings | grep LIBFFI_BASE_ | sort -u
LIBFFI_BASE_7.0
LIBFFI_BASE_7.1
LIBFFI_BASE_8.0

I think this suggests that the right libffi might not getting bundled, though I'm not positive since 3.4.2 is clearly installed as reported by conda.

ejguan commented 1 year ago

Do you mind sharing the result of ldd torchdata/_torchdata.so?

erip commented 1 year ago
$ conda create -n tmp -y python=3.10 -q && conda activate tmp && conda install -c pytorch torchdata -y -q
$ ldd $CONDA_PREFIX/lib/python3.10/site-packages/torchdata/_torchdata.so
        linux-vdso.so.1 (0x00007ffeb9bb5000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007faad19c0000)
        libz.so.1 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libz.so.1 (0x00007faad19a2000)
        libcurl.so.4 => /lib/x86_64-linux-gnu/libcurl.so.4 (0x00007faad1910000)
        libssl.so.1.1 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libssl.so.1.1 (0x00007faad187f000)
        libcrypto.so.1.1 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libcrypto.so.1.1 (0x00007faad15b2000)
        libstdc++.so.6 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libstdc++.so.6 (0x00007faad139c000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007faad124d000)
        libgcc_s.so.1 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libgcc_s.so.1 (0x00007faad1233000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007faad1041000)
        /lib64/ld-linux-x86-64.so.2 (0x00007faad210f000)
        libnghttp2.so.14 => /lib/x86_64-linux-gnu/libnghttp2.so.14 (0x00007faad1018000)
        libidn2.so.0 => /lib/x86_64-linux-gnu/libidn2.so.0 (0x00007faad0ff7000)
        librtmp.so.1 => /lib/x86_64-linux-gnu/librtmp.so.1 (0x00007faad0fd5000)
        libssh.so.4 => /lib/x86_64-linux-gnu/libssh.so.4 (0x00007faad0f67000)
        libpsl.so.5 => /lib/x86_64-linux-gnu/libpsl.so.5 (0x00007faad0f54000)
        libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007faad0f07000)
        libldap_r-2.4.so.2 => /lib/x86_64-linux-gnu/libldap_r-2.4.so.2 (0x00007faad0eb1000)
        liblber-2.4.so.2 => /lib/x86_64-linux-gnu/liblber-2.4.so.2 (0x00007faad0ea0000)
        libbrotlidec.so.1 => /lib/x86_64-linux-gnu/libbrotlidec.so.1 (0x00007faad0e90000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007faad0e8a000)
        libunistring.so.2 => /lib/x86_64-linux-gnu/libunistring.so.2 (0x00007faad0d08000)
        libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007faad0b32000)
        libhogweed.so.5 => /lib/x86_64-linux-gnu/libhogweed.so.5 (0x00007faad0afb000)
        libnettle.so.7 => /lib/x86_64-linux-gnu/libnettle.so.7 (0x00007faad0abf000)
        libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007faad0a3b000)
        libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007faad095e000)
        libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007faad092d000)
        libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007faad0926000)
        libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007faad0917000)
        libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007faad08f9000)
        libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007faad08dc000)
        libgssapi.so.3 => /lib/x86_64-linux-gnu/libgssapi.so.3 (0x00007faad0897000)
        libbrotlicommon.so.1 => /lib/x86_64-linux-gnu/libbrotlicommon.so.1 (0x00007faad0874000)
        libp11-kit.so.0 => /lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007faad073e000)
        libtasn1.so.6 => /lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007faad0728000)
        libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007faad071f000)
        libheimntlm.so.0 => /lib/x86_64-linux-gnu/libheimntlm.so.0 (0x00007faad0713000)
        libkrb5.so.26 => /lib/x86_64-linux-gnu/libkrb5.so.26 (0x00007faad0680000)
        libasn1.so.8 => /lib/x86_64-linux-gnu/libasn1.so.8 (0x00007faad05da000)
        libhcrypto.so.4 => /lib/x86_64-linux-gnu/libhcrypto.so.4 (0x00007faad05a2000)
        libroken.so.18 => /lib/x86_64-linux-gnu/libroken.so.18 (0x00007faad0587000)
        libffi.so.7 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libffi.so.7 (0x00007faad0576000)
        libwind.so.0 => /lib/x86_64-linux-gnu/libwind.so.0 (0x00007faad054c000)
        libheimbase.so.1 => /lib/x86_64-linux-gnu/libheimbase.so.1 (0x00007faad053a000)
        libhx509.so.5 => /lib/x86_64-linux-gnu/libhx509.so.5 (0x00007faad04ec000)
        libsqlite3.so.0 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libsqlite3.so.0 (0x00007faad039e000)
        libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007faad0363000)
ejguan commented 1 year ago

So, it correctly finds the conda-provided libffi

libffi.so.7 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libffi.so.7 (0x00007faad0576000)

Could you please share if there is libffi under /lib/x86_64-linux-gnu? And, ldd /lib/x86_64-linux-gnu/libp11-kit.so.0?

erip commented 1 year ago

Could you please share if there is libffi under /lib/x86_64-linux-gnu?

Yes, it seems like it.

$ find /lib/x86_64-linux-gnu/ -name "*libffi*"
/lib/x86_64-linux-gnu/pkgconfig/libffi.pc
/lib/x86_64-linux-gnu/libffi.so.7
/lib/x86_64-linux-gnu/libffi.a
/lib/x86_64-linux-gnu/libffi_pic.a
/lib/x86_64-linux-gnu/libffi.so
/lib/x86_64-linux-gnu/libffi.so.7.1.0

And, ldd /lib/x86_64-linux-gnu/libp11-kit.so.0?

$ ldd /lib/x86_64-linux-gnu/libp11-kit.so.0
        linux-vdso.so.1 (0x00007ffc8eb97000)
        libffi.so.7 => /lib/x86_64-linux-gnu/libffi.so.7 (0x00007fb798408000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb798402000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb7983df000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb7981ed000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fb798555000)
ejguan commented 1 year ago

I guess there might be some BC changes between libffi.so.7.1.0 and libffi.so.7 from. conda

But, it's still unclear to me why conda wants to install another libffi.so.7 when there is one from your system. Can you pls check when libffi.so.7 is installed by conda? During the time of creating new conda environment or installation of pytorch or torchdata?

erip commented 1 year ago

It seems to be at environment creation time so perhaps this is an upstream issue with conda on WSL. 😦

erip commented 1 year ago

I created an issue upstream which is linked here. Hopefully they'll have some input.