Open msdrigg opened 1 year ago
I tried all solutions mentioned in https://github.com/tensorflow/tensorflow/issues/34742, and nothing works. My final attempt was bazel build --compilation_mode=opt --jobs=25 --config=noaws --config=nogcp --config=nohdfs --config=nonccl --config=monolithic tensorflow
and it still did not solve the problem.
Are you pointing Rust to the TensorFlow library you built? There are instructions on how to do that at https://github.com/tensorflow/rust/blob/master/tensorflow-sys/README.md#manual-tensorflow-compilation.
Yes, I moved the compiled objects into /usr/local/lib and ran ldconfig on the directory.
+1 for me. All I do is open a http connection with the reqwest
crate and it crashes. It's totally unrelated to tensorflow, but somehow it now takes ownership of openssl lib.
* frame #0: 0x00007fffcc6969fc libc.so.6`__GI___pthread_kill at pthread_kill.c:44:76
frame #1: 0x00007fffcc6969b0 libc.so.6`__GI___pthread_kill [inlined] __pthread_kill_internal(signo=6, threadid=140737314203328) at pthread_kill.c:78:10
frame #2: 0x00007fffcc6969b0 libc.so.6`__GI___pthread_kill(threadid=140737314203328, signo=6) at pthread_kill.c:89:10
frame #3: 0x00007fffcc642476 libc.so.6`__GI_raise(sig=6) at raise.c:26:13
frame #4: 0x00007fffcc6287f3 libc.so.6`__GI_abort at abort.c:79:7
frame #5: 0x00007fffcc689676 libc.so.6`__libc_message(action=do_abort, fmt="\U00000010") at libc_fatal.c:155:5
frame #6: 0x00007fffcc6a0cfc libc.so.6`malloc_printerr(str=<unavailable>) at malloc.c:5664:3
frame #7: 0x00007fffcc6a2a44 libc.so.6`_int_free(av=<unavailable>, p=<unavailable>, have_lock=0) at malloc.c:4439:5
frame #8: 0x00007fffcc6a5453 libc.so.6`__GI___libc_free(mem=<unavailable>) at malloc.c:3391:7
frame #9: 0x00007ffff70a1c9a libtensorflow_framework.so.2`bssl::ssl_crypto_x509_ssl_ctx_free(ssl_ctx_st*) + 58
frame #10: 0x00007ffff7094f86 libtensorflow_framework.so.2`ssl_ctx_st::~ssl_ctx_st() + 70
frame #11: 0x00007ffff7095456 libtensorflow_framework.so.2`SSL_CTX_free + 38
frame #12: 0x00005555565c990e program`_$LT$openssl..ssl..SslContext$u20$as$u20$core..ops..drop..Drop$GT$::drop::he1e1bafd7778b929(self=0x00007fffcbe22000) at lib.rs:241:26
frame #13: 0x00005555565d58da program`core::ptr::drop_in_place$LT$openssl..ssl..SslContext$GT$::h8483f3eb796b6aee((null)=0x00007fffcbe22000) at mod.rs:497:1
frame #14: 0x000055555600318b program`core::ptr::drop_in_place$LT$openssl..ssl..connector..SslConnector$GT$::ha52c6b5831405ca0((null)=0x00007fffcbe22000) at mod.rs:497:1
frame #15: 0x000055555600316b program`core::ptr::drop_in_place$LT$native_tls..imp..TlsConnector$GT$::h898a325e5e6a2390((null)=0x00007fffcbe22000) at mod.rs:497:1
frame #16: 0x000055555600315b program`core::ptr::drop_in_place$LT$native_tls..TlsConnector$GT$::h05dcf2f2ec19f859((null)=0x00007fffcbe22000) at mod.rs:497:1
frame #17: 0x0000555555f4215c program`core::ptr::drop_in_place$LT$reqwest..connect..Inner$GT$::h1ccf2f0fb635dba6((null)=0x00007fffcbe21fe8) at mod.rs:497:1
frame #18: 0x0000555555f425cb program`core::ptr::drop_in_place$LT$reqwest..connect..Connector$GT$::hbfb676efb078b00f((null)=0x00007fffcbe21fd8) at mod.rs:497:1
frame #19: 0x0000555555f3a42e program`core::ptr::drop_in_place$LT$hyper_util..client..legacy..client..Client$LT$reqwest..connect..Connector$C$reqwest..async_impl..body..Body$GT$$GT$::h6a79d474bb243160((null)=0x00007fffcbe21f10) at mod.rs:497:1
frame #20: 0x0000555555f4346c program`core::ptr::drop_in_place$LT$reqwest..async_impl..client..ClientRef$GT$::h621ffac56c4ab15f((null)=0x00007fffcbe21f10) at mod.rs:497:1
frame #21: 0x0000555555f0ee3f program`alloc::sync::Arc$LT$T$C$A$GT$::drop_slow::h6322ecbb95a2aa20(self=0x00007fffffff3540) at sync.rs:1751:18
frame #22: 0x0000555555f147e5 program`_$LT$alloc..sync..Arc$LT$T$C$A$GT$$u20$as$u20$core..ops..drop..Drop$GT$::drop::h25fd65b8fc0fe2ed(self=0x00007fffffff3540) at sync.rs:2407:13
frame #23: 0x0000555555f4485b program`core::ptr::drop_in_place$LT$alloc..sync..Arc$LT$reqwest..async_impl..client..ClientRef$GT$$GT$::h15126958e085d642((null)=0x00007fffffff3540) at mod.rs:497:1
frame #24: 0x0000555555f431db program`core::ptr::drop_in_place$LT$reqwest..async_impl..client..Client$GT$::h9be9cca15fbe82e9((null)=0x00007fffffff3540) at mod.rs:497:1
ldd target/debug/program
linux-vdso.so.1 (0x00007ffef968b000)
libtensorflow_framework.so.2 => /usr/local/lib/libtensorflow_framework.so.2 (0x0000783696400000)
libtensorflow.so.2 => /usr/local/lib/libtensorflow.so.2 (0x000078366de00000)
libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x000078369895c000)
libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x000078366d800000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x000078366d400000)
libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1 (0x000078369ba1f000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x000078369b9ff000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000078366dd19000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000078366d000000)
/lib64/ld-linux-x86-64.so.2 (0x000078369ba80000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x000078369b9f8000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x000078369b9f3000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x000078369b9ee000)
I worked around this by placing libssl and libcrypto before tensorflow in order of priority above. Create a build.rs
with this code:
use std::error::Error;
fn main() -> Result<(), Box<dyn Error>> {
println!("cargo:rustc-link-lib=dylib=ssl");
println!("cargo:rustc-link-lib=dylib=crypto");
Ok(())
}
and note that libssl and libcrypto are not before libtensorflow so it would never try to use tensorflow's statically linked ssl:
linux-vdso.so.1 (0x00007fffbb3a9000)
libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x0000772aa0f5c000) <-- here
libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x0000772aa0a00000) <-- here
libtensorflow_framework.so.2 => /usr/local/lib/libtensorflow_framework.so.2 (0x0000772a9b400000)
libtensorflow.so.2 => /usr/local/lib/libtensorflow.so.2 (0x0000772a76000000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x0000772a75c00000)
libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1 (0x0000772aa401c000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x0000772aa3ffc000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000772aa0e75000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000772a75800000)
/lib64/ld-linux-x86-64.so.2 (0x0000772aa407d000)
It's still broken for unit tests because to my knowledge there's no way to enforce the linking order in tests.
Ideally libtensorflow should never be statically linked to openssl and let the binary choose its own libssl.
So I recently added
tensorflow
to a rust project that had an external openssl dependency (reqwests
andpaho-mqtt
) and I immediately started seeing segfaults. The strange thing is that these segfaults are coming from crypto functions being called in thetensorflow_framework.so.2
library from frompaho-mqtt
(SSLSocket_initialize
in the core dump shown below). If I remove the paho-mqtt dependency on ssl, I see similar things withreqwests
Relevant Logs
This backtrace reliably occurs everytime I run my program.
Interestingly, here's what I see from ldd. Note that libssl.so.3 does correctly point to the real openssl, so I don't know why at runtime it gets linked to
tensorflow_framework.so.2
Note: I am using the latest rust versions and the latest versions of all packages mentioned here. Here's what my
uname -a
output looks like:Prior Art
The only other mention of this issue I could find was here https://github.com/tensorflow/tensorflow/issues/34742, and I am currently trying to resolve my problem using the steps outlined in that issue.
Goals
A perfect fix would be for me to be able to seamlessly use tensorflow and openssl in a project without any tweaks, but I would consider this issue closed for me if we could find some workaround (environmental variables, build script or something similar) so that I could make my project run without segfaulting.