Open naibaf7 opened 1 month ago
High priority to validate and fix
In addition to fixing the issue, we should also add a verification step to the binary builds, to check that none of the symbols from std::
namespace are exported from PyTorch
Don't think this is unique to 2.4.0. Probably a bug in devtoolset that these symbols are not marked as weak (W).
_ZNKSt10filesystem28recursive_directory_iterator17recursion_pendingEv
is not marked as hidden in gcc11-libstdc++-compat.patch
found at https://vault.centos.org/7.9.2009/sclo/Source/rh/devtoolset-11-gcc-11.2.1-9.1.el7.src.rpm
I confirm I do see this issue with Pytorch 2.4.0 release :
(base) atalman@ip-10-200-84-248:~/miniconda3/lib/python3.11/site-packages/torch/lib$ nm -D libtorch_cpu.so | grep "recursive_directory_iterator"
0000000014273cc0 T _ZNKSt10filesystem28recursive_directory_iterator17recursion_pendingEv
0000000014273c70 T _ZNKSt10filesystem28recursive_directory_iterator5depthEv
0000000014273c60 T _ZNKSt10filesystem28recursive_directory_iterator7optionsEv
0000000014273cd0 T _ZNKSt10filesystem28recursive_directory_iteratordeEv
00000000142771c0 T _ZNKSt10filesystem7__cxx1128recursive_directory_iterator17recursion_pendingEv
0000000014277170 T _ZNKSt10filesystem7__cxx1128recursive_directory_iterator5depthEv
0000000014277160 T _ZNKSt10filesystem7__cxx1128recursive_directory_iterator7optionsEv
00000000142771d0 T _ZNKSt10filesystem7__cxx1128recursive_directory_iteratordeEv
0000000014273ea0 T _ZNSt10filesystem28recursive_directory_iterator25disable_recursion_pendingEv
00000000142749b0 T _ZNSt10filesystem28recursive_directory_iterator3popERSt10error_code
0000000014274c40 T _ZNSt10filesystem28recursive_directory_iterator3popEv
00000000142757c0 T _ZNSt10filesystem28recursive_directory_iterator9incrementERSt10error_code
0000000014274fe0 T _ZNSt10filesystem28recursive_directory_iteratorC1ERKNS_4pathENS_17directory_optionsEPSt10error_code
0000000014274fe0 T _ZNSt10filesystem28recursive_directory_iteratorC2ERKNS_4pathENS_17directory_optionsEPSt10error_code
0000000014273bd0 T _ZNSt10filesystem28recursive_directory_iteratorD1Ev
0000000014273bd0 T _ZNSt10filesystem28recursive_directory_iteratorD2Ev
0000000014273de0 T _ZNSt10filesystem28recursive_directory_iteratoraSEOS0_
0000000014275de0 T _ZNSt10filesystem28recursive_directory_iteratorppEv
00000000142773a0 T _ZNSt10filesystem7__cxx1128recursive_directory_iterator25disable_recursion_pendingEv
0000000014277d70 T _ZNSt10filesystem7__cxx1128recursive_directory_iterator3popERSt10error_code
0000000014277f40 T _ZNSt10filesystem7__cxx1128recursive_directory_iterator3popEv
0000000014278aa0 T _ZNSt10filesystem7__cxx1128recursive_directory_iterator9incrementERSt10error_code
0000000014278370 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratorC1ERKNS0_4pathENS_17directory_optionsEPSt10error_code
0000000014278370 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratorC2ERKNS0_4pathENS_17directory_optionsEPSt10error_code
00000000142770d0 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratorD1Ev
00000000142770d0 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratorD2Ev
00000000142772e0 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratoraSEOS1_
0000000014278f50 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratorppEv
0000000014272ed0 W _ZNSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEC1EOS5_
0000000014272b20 W _ZNSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEC1Ev
00000000142764f0 W _ZNSt12__shared_ptrINSt10filesystem7__cxx1128recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEC1EOS6_
0000000014276260 W _ZNSt12__shared_ptrINSt10filesystem7__cxx1128recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEC1Ev
This does not happen with 2.3.1 release
Looks like this regression happened in this nightly : https://github.com/pytorch/pytorch/commit/529bedad3e6cbd98387b9e46ce0c0e290ed12ab5
Looks like this change introduce the use of std::filesystem
by PyTorch https://github.com/pytorch/pytorch/pull/127805 (which is a reland of https://github.com/pytorch/pytorch/pull/126601 )
Confirmed with torch final rc 2.4.1+cu121
Re-opening and re-wording as it happens in 2.5
🐛 Describe the bug
Starting with torch 2.4.0, the following
libstdc++
CXX11 symbols are visible:This will cause issues for any imported Python module that relies on a dynamically linked CXX11
libstdc++.so
Ideally
libtorch_cpu.so
and any other torch library should avoid statically linking anylibstd
components.Versions
Collecting environment information... PyTorch version: 2.4.0+cpu Is debug build: False CUDA used to build PyTorch: Could not collect ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.2 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: 20.0.0git (https://github.com/llvm/llvm-project.git 0795ab4eba14b7a93c52c06f328c3d4272f3c51e) CMake version: version 3.29.3 Libc version: glibc-2.35
[pip3] torch==2.4.0+cpu [pip3] torchaudio==2.4.0+cpu [pip3] torchvision==0.19.0+cpu
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @malfet @osalpekar @atalman