pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
82.35k stars 22.15k forks source link

PyTorch 2.5.0 exposes statically linked `libstdc++` CXX11 ABI symbols. #133437

Open naibaf7 opened 1 month ago

naibaf7 commented 1 month ago

🐛 Describe the bug

Starting with torch 2.4.0, the following libstdc++ CXX11 symbols are visible:

nm -D libtorch_cpu.so | grep "recursive_directory_iterator"
0000000014273cc0 T _ZNKSt10filesystem28recursive_directory_iterator17recursion_pendingEv
0000000014273c70 T _ZNKSt10filesystem28recursive_directory_iterator5depthEv
0000000014273c60 T _ZNKSt10filesystem28recursive_directory_iterator7optionsEv
0000000014273cd0 T _ZNKSt10filesystem28recursive_directory_iteratordeEv
00000000142771c0 T _ZNKSt10filesystem7__cxx1128recursive_directory_iterator17recursion_pendingEv
0000000014277170 T _ZNKSt10filesystem7__cxx1128recursive_directory_iterator5depthEv
0000000014277160 T _ZNKSt10filesystem7__cxx1128recursive_directory_iterator7optionsEv
00000000142771d0 T _ZNKSt10filesystem7__cxx1128recursive_directory_iteratordeEv
0000000014273ea0 T _ZNSt10filesystem28recursive_directory_iterator25disable_recursion_pendingEv
00000000142749b0 T _ZNSt10filesystem28recursive_directory_iterator3popERSt10error_code
0000000014274c40 T _ZNSt10filesystem28recursive_directory_iterator3popEv
00000000142757c0 T _ZNSt10filesystem28recursive_directory_iterator9incrementERSt10error_code
0000000014274fe0 T _ZNSt10filesystem28recursive_directory_iteratorC1ERKNS_4pathENS_17directory_optionsEPSt10error_code
0000000014274fe0 T _ZNSt10filesystem28recursive_directory_iteratorC2ERKNS_4pathENS_17directory_optionsEPSt10error_code
0000000014273bd0 T _ZNSt10filesystem28recursive_directory_iteratorD1Ev
0000000014273bd0 T _ZNSt10filesystem28recursive_directory_iteratorD2Ev
0000000014273de0 T _ZNSt10filesystem28recursive_directory_iteratoraSEOS0_
0000000014275de0 T _ZNSt10filesystem28recursive_directory_iteratorppEv
00000000142773a0 T _ZNSt10filesystem7__cxx1128recursive_directory_iterator25disable_recursion_pendingEv
0000000014277d70 T _ZNSt10filesystem7__cxx1128recursive_directory_iterator3popERSt10error_code
0000000014277f40 T _ZNSt10filesystem7__cxx1128recursive_directory_iterator3popEv
0000000014278aa0 T _ZNSt10filesystem7__cxx1128recursive_directory_iterator9incrementERSt10error_code
0000000014278370 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratorC1ERKNS0_4pathENS_17directory_optionsEPSt10error_code
0000000014278370 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratorC2ERKNS0_4pathENS_17directory_optionsEPSt10error_code
00000000142770d0 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratorD1Ev
00000000142770d0 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratorD2Ev
00000000142772e0 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratoraSEOS1_
0000000014278f50 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratorppEv
0000000014272ed0 W _ZNSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEC1EOS5_
0000000014272b20 W _ZNSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEC1Ev
00000000142764f0 W _ZNSt12__shared_ptrINSt10filesystem7__cxx1128recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEC1EOS6_
0000000014276260 W _ZNSt12__shared_ptrINSt10filesystem7__cxx1128recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEC1Ev

This will cause issues for any imported Python module that relies on a dynamically linked CXX11 libstdc++.so

Ideally libtorch_cpu.so and any other torch library should avoid statically linking any libstd components.

Versions

Collecting environment information... PyTorch version: 2.4.0+cpu Is debug build: False CUDA used to build PyTorch: Could not collect ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.2 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: 20.0.0git (https://github.com/llvm/llvm-project.git 0795ab4eba14b7a93c52c06f328c3d4272f3c51e) CMake version: version 3.29.3 Libc version: glibc-2.35

[pip3] torch==2.4.0+cpu [pip3] torchaudio==2.4.0+cpu [pip3] torchvision==0.19.0+cpu

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @malfet @osalpekar @atalman

malfet commented 1 month ago

High priority to validate and fix

malfet commented 1 month ago

In addition to fixing the issue, we should also add a verification step to the binary builds, to check that none of the symbols from std:: namespace are exported from PyTorch

isuruf commented 1 month ago

Don't think this is unique to 2.4.0. Probably a bug in devtoolset that these symbols are not marked as weak (W).

isuruf commented 1 month ago

_ZNKSt10filesystem28recursive_directory_iterator17recursion_pendingEv is not marked as hidden in gcc11-libstdc++-compat.patch found at https://vault.centos.org/7.9.2009/sclo/Source/rh/devtoolset-11-gcc-11.2.1-9.1.el7.src.rpm

atalman commented 3 weeks ago

I confirm I do see this issue with Pytorch 2.4.0 release :

(base) atalman@ip-10-200-84-248:~/miniconda3/lib/python3.11/site-packages/torch/lib$ nm -D libtorch_cpu.so | grep "recursive_directory_iterator"
0000000014273cc0 T _ZNKSt10filesystem28recursive_directory_iterator17recursion_pendingEv
0000000014273c70 T _ZNKSt10filesystem28recursive_directory_iterator5depthEv
0000000014273c60 T _ZNKSt10filesystem28recursive_directory_iterator7optionsEv
0000000014273cd0 T _ZNKSt10filesystem28recursive_directory_iteratordeEv
00000000142771c0 T _ZNKSt10filesystem7__cxx1128recursive_directory_iterator17recursion_pendingEv
0000000014277170 T _ZNKSt10filesystem7__cxx1128recursive_directory_iterator5depthEv
0000000014277160 T _ZNKSt10filesystem7__cxx1128recursive_directory_iterator7optionsEv
00000000142771d0 T _ZNKSt10filesystem7__cxx1128recursive_directory_iteratordeEv
0000000014273ea0 T _ZNSt10filesystem28recursive_directory_iterator25disable_recursion_pendingEv
00000000142749b0 T _ZNSt10filesystem28recursive_directory_iterator3popERSt10error_code
0000000014274c40 T _ZNSt10filesystem28recursive_directory_iterator3popEv
00000000142757c0 T _ZNSt10filesystem28recursive_directory_iterator9incrementERSt10error_code
0000000014274fe0 T _ZNSt10filesystem28recursive_directory_iteratorC1ERKNS_4pathENS_17directory_optionsEPSt10error_code
0000000014274fe0 T _ZNSt10filesystem28recursive_directory_iteratorC2ERKNS_4pathENS_17directory_optionsEPSt10error_code
0000000014273bd0 T _ZNSt10filesystem28recursive_directory_iteratorD1Ev
0000000014273bd0 T _ZNSt10filesystem28recursive_directory_iteratorD2Ev
0000000014273de0 T _ZNSt10filesystem28recursive_directory_iteratoraSEOS0_
0000000014275de0 T _ZNSt10filesystem28recursive_directory_iteratorppEv
00000000142773a0 T _ZNSt10filesystem7__cxx1128recursive_directory_iterator25disable_recursion_pendingEv
0000000014277d70 T _ZNSt10filesystem7__cxx1128recursive_directory_iterator3popERSt10error_code
0000000014277f40 T _ZNSt10filesystem7__cxx1128recursive_directory_iterator3popEv
0000000014278aa0 T _ZNSt10filesystem7__cxx1128recursive_directory_iterator9incrementERSt10error_code
0000000014278370 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratorC1ERKNS0_4pathENS_17directory_optionsEPSt10error_code
0000000014278370 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratorC2ERKNS0_4pathENS_17directory_optionsEPSt10error_code
00000000142770d0 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratorD1Ev
00000000142770d0 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratorD2Ev
00000000142772e0 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratoraSEOS1_
0000000014278f50 T _ZNSt10filesystem7__cxx1128recursive_directory_iteratorppEv
0000000014272ed0 W _ZNSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEC1EOS5_
0000000014272b20 W _ZNSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEC1Ev
00000000142764f0 W _ZNSt12__shared_ptrINSt10filesystem7__cxx1128recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEC1EOS6_
0000000014276260 W _ZNSt12__shared_ptrINSt10filesystem7__cxx1128recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEC1Ev

This does not happen with 2.3.1 release

Looks like this regression happened in this nightly : https://github.com/pytorch/pytorch/commit/529bedad3e6cbd98387b9e46ce0c0e290ed12ab5

malfet commented 3 weeks ago

Looks like this change introduce the use of std::filesystem by PyTorch https://github.com/pytorch/pytorch/pull/127805 (which is a reland of https://github.com/pytorch/pytorch/pull/126601 )

atalman commented 3 weeks ago

Confirmed with torch final rc 2.4.1+cu121

malfet commented 3 days ago

Re-opening and re-wording as it happens in 2.5