Open erip opened 1 year ago
Try install libffi
from conda?
I did and it didn't help (no change in error). This is a clean install so it's seems like an issue with the conda packaging of torchdata. Note that the wheel is fine.
Do you mind checking libffi
version?
If it's installed from conda, you should be able to find it via conda list
When torchdata
0.5.1 compiled, it used libffi
3.4.2. It seems your environment is trying to use LIBFFI_BASE_7.0
.
Yes, here are the relevant details:
$ conda list | grep libffi
libffi 3.4.2 h6a678d5_6
$ python -c "import torchdata"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/erip/anaconda3/envs/tmp-dev/lib/python3.10/site-packages/torchdata/__init__.py", line 7, in <module>
from torchdata import _extension # noqa: F401
File "/home/erip/anaconda3/envs/tmp-dev/lib/python3.10/site-packages/torchdata/_extension.py", line 34, in <module>
_init_extension()
File "/home/erip/anaconda3/envs/tmp-dev/lib/python3.10/site-packages/torchdata/_extension.py", line 31, in _init_extension
from torchdata import _torchdata as _torchdata
ImportError: /usr/lib/x86_64-linux-gnu/libp11-kit.so.0: undefined symbol: ffi_type_pointer, version LIBFFI_BASE_7.0
It seems like the right lib is coming along for the ride, but something is causing it to be linked improperly. I tried messing with LD_LIBRARY_PATH
as well but it didn't seem to help.
TBH, I don't know. How about try to use pip
to re-install cffi
as well?
It doesn't seem to help --- with a force reinstall, cffi 1.15.1
gets (re)installed but with the same error. I can dig into this a bit more soon, but the workaround is to pip install torchdata in the short-term.
Yeah. I think pip
works simply because we statically link those packages to the wheel, which would prevent this scenario of finding wrong shared lib.
One thing I intend to do is look in $CONDA_PREFIX
to see if the right version of libffi is there. I'm not sure what the offending lib is from the error message (looks like a crypto lib?) so I'm not sure how it interacts with the search path either...
After conda installing torchdata
, I see the following:
$ find $CONDA_PREFIX -name "*libffi*" | xargs strings | grep LIBFFI_BASE_ | sort -u
LIBFFI_BASE_7.0
LIBFFI_BASE_7.1
LIBFFI_BASE_8.0
I think this suggests that the right libffi might not getting bundled, though I'm not positive since 3.4.2 is clearly installed as reported by conda.
Do you mind sharing the result of ldd torchdata/_torchdata.so
?
$ conda create -n tmp -y python=3.10 -q && conda activate tmp && conda install -c pytorch torchdata -y -q
$ ldd $CONDA_PREFIX/lib/python3.10/site-packages/torchdata/_torchdata.so
linux-vdso.so.1 (0x00007ffeb9bb5000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007faad19c0000)
libz.so.1 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libz.so.1 (0x00007faad19a2000)
libcurl.so.4 => /lib/x86_64-linux-gnu/libcurl.so.4 (0x00007faad1910000)
libssl.so.1.1 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libssl.so.1.1 (0x00007faad187f000)
libcrypto.so.1.1 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libcrypto.so.1.1 (0x00007faad15b2000)
libstdc++.so.6 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libstdc++.so.6 (0x00007faad139c000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007faad124d000)
libgcc_s.so.1 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libgcc_s.so.1 (0x00007faad1233000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007faad1041000)
/lib64/ld-linux-x86-64.so.2 (0x00007faad210f000)
libnghttp2.so.14 => /lib/x86_64-linux-gnu/libnghttp2.so.14 (0x00007faad1018000)
libidn2.so.0 => /lib/x86_64-linux-gnu/libidn2.so.0 (0x00007faad0ff7000)
librtmp.so.1 => /lib/x86_64-linux-gnu/librtmp.so.1 (0x00007faad0fd5000)
libssh.so.4 => /lib/x86_64-linux-gnu/libssh.so.4 (0x00007faad0f67000)
libpsl.so.5 => /lib/x86_64-linux-gnu/libpsl.so.5 (0x00007faad0f54000)
libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007faad0f07000)
libldap_r-2.4.so.2 => /lib/x86_64-linux-gnu/libldap_r-2.4.so.2 (0x00007faad0eb1000)
liblber-2.4.so.2 => /lib/x86_64-linux-gnu/liblber-2.4.so.2 (0x00007faad0ea0000)
libbrotlidec.so.1 => /lib/x86_64-linux-gnu/libbrotlidec.so.1 (0x00007faad0e90000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007faad0e8a000)
libunistring.so.2 => /lib/x86_64-linux-gnu/libunistring.so.2 (0x00007faad0d08000)
libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007faad0b32000)
libhogweed.so.5 => /lib/x86_64-linux-gnu/libhogweed.so.5 (0x00007faad0afb000)
libnettle.so.7 => /lib/x86_64-linux-gnu/libnettle.so.7 (0x00007faad0abf000)
libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007faad0a3b000)
libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007faad095e000)
libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007faad092d000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007faad0926000)
libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007faad0917000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007faad08f9000)
libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007faad08dc000)
libgssapi.so.3 => /lib/x86_64-linux-gnu/libgssapi.so.3 (0x00007faad0897000)
libbrotlicommon.so.1 => /lib/x86_64-linux-gnu/libbrotlicommon.so.1 (0x00007faad0874000)
libp11-kit.so.0 => /lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007faad073e000)
libtasn1.so.6 => /lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007faad0728000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007faad071f000)
libheimntlm.so.0 => /lib/x86_64-linux-gnu/libheimntlm.so.0 (0x00007faad0713000)
libkrb5.so.26 => /lib/x86_64-linux-gnu/libkrb5.so.26 (0x00007faad0680000)
libasn1.so.8 => /lib/x86_64-linux-gnu/libasn1.so.8 (0x00007faad05da000)
libhcrypto.so.4 => /lib/x86_64-linux-gnu/libhcrypto.so.4 (0x00007faad05a2000)
libroken.so.18 => /lib/x86_64-linux-gnu/libroken.so.18 (0x00007faad0587000)
libffi.so.7 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libffi.so.7 (0x00007faad0576000)
libwind.so.0 => /lib/x86_64-linux-gnu/libwind.so.0 (0x00007faad054c000)
libheimbase.so.1 => /lib/x86_64-linux-gnu/libheimbase.so.1 (0x00007faad053a000)
libhx509.so.5 => /lib/x86_64-linux-gnu/libhx509.so.5 (0x00007faad04ec000)
libsqlite3.so.0 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libsqlite3.so.0 (0x00007faad039e000)
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007faad0363000)
So, it correctly finds the conda-provided libffi
libffi.so.7 => /home/erip/anaconda3/envs/tmp/lib/python3.10/site-packages/torchdata/../../../libffi.so.7 (0x00007faad0576000)
Could you please share if there is libffi
under /lib/x86_64-linux-gnu
?
And, ldd /lib/x86_64-linux-gnu/libp11-kit.so.0
?
Could you please share if there is libffi under /lib/x86_64-linux-gnu?
Yes, it seems like it.
$ find /lib/x86_64-linux-gnu/ -name "*libffi*"
/lib/x86_64-linux-gnu/pkgconfig/libffi.pc
/lib/x86_64-linux-gnu/libffi.so.7
/lib/x86_64-linux-gnu/libffi.a
/lib/x86_64-linux-gnu/libffi_pic.a
/lib/x86_64-linux-gnu/libffi.so
/lib/x86_64-linux-gnu/libffi.so.7.1.0
And,
ldd /lib/x86_64-linux-gnu/libp11-kit.so.0
?
$ ldd /lib/x86_64-linux-gnu/libp11-kit.so.0
linux-vdso.so.1 (0x00007ffc8eb97000)
libffi.so.7 => /lib/x86_64-linux-gnu/libffi.so.7 (0x00007fb798408000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb798402000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb7983df000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb7981ed000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb798555000)
I guess there might be some BC changes between libffi.so.7.1.0 and libffi.so.7 from. conda
But, it's still unclear to me why conda wants to install another libffi.so.7 when there is one from your system. Can you pls check when libffi.so.7 is installed by conda? During the time of creating new conda environment or installation of pytorch or torchdata?
It seems to be at environment creation time so perhaps this is an upstream issue with conda on WSL. 😦
I created an issue upstream which is linked here. Hopefully they'll have some input.
🐛 Describe the bug
Versions
PyTorch version: 1.13.1 Is debug build: False CUDA used to build PyTorch: 11.7 ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.5 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.31
Python version: 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.15.79.1-microsoft-standard-WSL2-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: 11.7.99 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090 Nvidia driver version: 528.02 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
Versions of relevant libraries: [pip3] torch==1.13.1 [pip3] torchdata==0.5.1 [pip3] torchtext==0.14.1 [conda] blas 1.0 mkl [conda] mkl 2022.1.0 hc2b9512_224 [conda] pytorch 1.13.1 py3.10_cuda11.7_cudnn8.5.0_0 pytorch [conda] pytorch-cuda 11.7 h67b0de4_1 pytorch [conda] pytorch-mutex 1.0 cuda pytorch [conda] torchdata 0.5.1 pypi_0 pypi [conda] torchtext 0.14.1 py310 pytorch