Open Mizux opened 1 month ago
Some update: This weekend I was able to reproduce it with a pet project without involving pybind11 etc... EDIT: same issue with "v28.0-rc1" too
It seems the issue comes from the shared library nature...
here the project layout:
graph TD;
A[bin/App\nbinary]-->FB[lib/libFooBar.so\nlibrary];
FB-->F[lib/libFoo.so\nlibrary];
FB-->B[lib/libBar.so\nlibrary];
F-.->Pc[C.proto];
Pc-- "import" -->Pa[A.proto];
Pc-- "import" -->Pb[B.proto];
ref: https://github.com/mizux/bazel-proto
% cmake -S. -Bbuild
% cmake --build build -j
% ./build/bin/App
...
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1721036258.637555 1559662 generated_message_reflection.cc:3619] Check failed: file != nullptr
*** Check failure stack trace: ***
zsh: IOT instruction ./build/bin/App
if I apply this patch to make SHARED
libraries to be STATIC
:
diff --git a/foo/CMakeLists.txt b/foo/CMakeLists.txt
index aee9266..c7476a1 100644
--- a/foo/CMakeLists.txt
+++ b/foo/CMakeLists.txt
@@ -1,4 +1,4 @@
-add_library(Foo) # shared implicit
+add_library(Foo STATIC "")
get_cpp_proto(PROTO_HDRS PROTO_SRCS)
target_sources(Foo
diff --git a/foobar/CMakeLists.txt b/foobar/CMakeLists.txt
index 724f2e1..d4fe85a 100644
--- a/foobar/CMakeLists.txt
+++ b/foobar/CMakeLists.txt
@@ -1,4 +1,4 @@
-add_library(FooBar "") # shared implicit
+add_library(FooBar STATIC "")
target_sources(FooBar PRIVATE foobar.h foobar.cc)
target_include_directories(FooBar
PUBLIC
% cmake --build build -j
% ./build/bin/App
...
(exit 0)
i.e. App embedded the libFoo.a
then test pass, aka bin/App
run without error...
So I suspect the Protobuf commit to break the shared library mode.
Back to google/or-tools when generating our wheel package we have the following layout:
ortools
├── __init__.py
├── .libs
│ └── libortools.so.10 (48Mo)
├── algorithms
│ ├── __init__.py
│ └── python
│ ├── __init__.py
│ ├── knapsack_solver.cpython-312-x86_64-linux-gnu.so
│ └── knapsack_solver.pyi
├── graph
│ ├── __init__.py
│ └── python
│ ├── __init__.py
│ ├── max_flow.cpython-312-x86_64-linux-gnu.so
│ ├── max_flow.pyi
│ ├── min_cost_flow.cpython-312-x86_64-linux-gnu.so
│ ├── min_cost_flow.pyi
│ ├── linear_sum_assignment.cpython-312-x86_64-linux-gnu.so
│ └── linear_sum_assignment.pyi
├── routing
│ ├── enums_pb2.py
│ ├── enums_pb2.pyi
│ ├── ils_pb2.py
│ ├── ils_pb2.pyi
│ ├── __init__.py
│ └── python
│ ├── __init__.py
│ ├── model.cpython-312-x86_64-linux-gnu.so (2Mo)
│ ├── model.pyi
├── sat
...
graph TD;
D_O[ortools];
D_O-->D_L[.libs];
D_L-->OR[(libortools.so\nortools/.lib)];
D_O-->D_A[algorithms];
D_A-->D_AP[python];
D_AP-->A[(algorithms.cpython-312-x86_64-linux-gnu.so\nortools/algorithms/python)];
A-. link .->OR;
D_O-->D_G[graph];
D_G-->D_GP[python];
D_GP-->G1[(max_flow.cpython-312-x86_64-linux-gnu.so\nortools/graph/python)];
D_GP-->G2[(min_cost_flow.cpython-312-x86_64-linux-gnu.so\nortools/graph/python)];
G1-. link .->OR;
G2-. link .->OR;
D_O-->D_R[routing];
D_R-->D_RP[python];
D_RP-->R[(model.cpython-312-x86_64-linux-gnu.so\nortools/routing/python)];
R-. link .->OR;
D_O-->D_X[component];
D_X-->D_XP[python];
D_XP-->X[(component.cpython-312-x86_64-linux-gnu.so\nortools/<component>/python)];
X-. link .->OR;
note: we are using RUNPATH for each *.cpython.so
lib to find the libortools.so
.
objdump -p ortools/algorithms/python/knapsack_solver.cpython-312-x86_64-linux-gnu.so | grep RUNPATH
RUNPATH $ORIGIN:$ORIGIN/../../../ortools/.libs
using a shared lib libortools.so avoid to have to duplicate symbols in each python "sub" module. note: libortools.so it's 48Mo, since we have a dozen of python modules so 48Mo * 12 is a no go (pypi limit is 200Mo per wheel packages...)
Dev Note playing with http://github.com/mizux/bazel-proto and using a hackish version of protobuf 58baeb4 vs 49c83ab
Adding patch:
diff --git a/src/google/protobuf/generated_message_reflection.cc b/src/google/protobuf/generated_message_reflection.cc
index d0543d134..450cf589d 100644
--- a/src/google/protobuf/generated_message_reflection.cc
+++ b/src/google/protobuf/generated_message_reflection.cc
@@ -3642,6 +3642,7 @@ void AssignDescriptorsImpl(const DescriptorTable* table, bool eager) {
}
// Fill the arrays with pointers to descriptors and reflection classes.
+ std::cerr << "filename: '" << table->filename << "'\n";
const FileDescriptor* file =
DescriptorPool::internal_generated_pool()->FindFileByName(
table->filename);
Using 58baeb4 :
./build/bin/App
...
"FooBar":{"Bar":{"int":13,"int64":31},"Foo":{"int":17,"int64":42}}
filename: 'google/protobuf/descriptor.proto'
filename: 'google/protobuf/cpp_features.proto'
filename: 'foo/C.proto'
filename: 'foo/A.proto'
filename: 'foo/B.proto'
a {
name: "42"
}
b {
value: 42
}
Using 49c83ab :
./build/bin/App
...
"FooBar":{"Bar":{"int":13,"int64":31},"Foo":{"int":17,"int64":42}}
filename: 'google/protobuf/descriptor.proto'
filename: 'google/protobuf/cpp_features.proto'
filename: 'foo/C.proto'
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1721046248.069280 1828260 generated_message_reflection.cc:3623] Check failed: file != nullptr
*** Check failure stack trace: ***
zsh: IOT instruction ./build/bin/App
Small update, buiding Protobuf in SHARED make the bazel-proto tests (e.g. foo_test) pass...
diff --git a/cmake/dependencies/CMakeLists.txt b/cmake/dependencies/CMakeLists.txt
index 7bf92be..3b712ba 100644
--- a/cmake/dependencies/CMakeLists.txt
+++ b/cmake/dependencies/CMakeLists.txt
@@ -52,13 +52,15 @@ if(BUILD_Protobuf)
message(CHECK_START "Fetching Protobuf")
list(APPEND CMAKE_MESSAGE_INDENT " ")
set(protobuf_BUILD_TESTS OFF)
- set(protobuf_BUILD_SHARED_LIBS OFF)
+ set(protobuf_BUILD_SHARED_LIBS ON)
set(protobuf_BUILD_EXPORT OFF)
set(protobuf_MSVC_STATIC_RUNTIME OFF)
#set(protobuf_BUILD_LIBUPB ON)
i.e.
BAD: bin/foo_test
-> lib/libFoo.so
--static link--> libprotobuf.a
GOOD: bin/foo_test
-> lib/libFoo.so
--ldd--> libprotobuf.so
Adding some log (thx to @sbenzaquen for the suggestion). Seems we have an ODR violation when using a static version of Protobuf
The patch in Protobuf to have some insight...
git diff HEAD~1
diff --git a/src/google/protobuf/generated_message_util.cc b/src/google/protobuf/generated_message_util.cc
index e33d22fc4..7b9663d43 100644
--- a/src/google/protobuf/generated_message_util.cc
+++ b/src/google/protobuf/generated_message_util.cc
@@ -88,18 +88,23 @@ void InitWeakDefaults() {}
PROTOBUF_CONSTINIT std::atomic<bool> init_protobuf_defaults_state{false};
static bool InitProtobufDefaultsImpl() {
+ std::cerr << __func__ << ":" << __LINE__ << " call...\n";
fixed_address_empty_string.DefaultConstruct();
OnShutdownDestroyString(fixed_address_empty_string.get_mutable());
InitWeakDefaults();
init_protobuf_defaults_state.store(true, std::memory_order_release);
+ std::cerr << __func__ << ":" << __LINE__ << " done.\n";
return true;
}
void InitProtobufDefaultsSlow() {
+ std::cerr << __func__ << ":" << __LINE__ << " call...\n";
static bool is_inited = InitProtobufDefaultsImpl();
(void)is_inited;
+ std::cerr << "is_inited: " << &is_inited << ".\n";
+ std::cerr << __func__ << ":" << __LINE__ << "done.\n";
}
// Force the initialization of the empty string.
// Normally, registration would do it, but we don't have any guarantee that
The trace of build/bin/foo_test when using libprotobuf.a patched and with the commit @49c83ab
./build/bin/foo_test
InitProtobufDefaultsSlow:103 call...
InitProtobufDefaultsImpl:91 call...
InitProtobufDefaultsImpl:98 done.
is_inited: 0x7ff270270e08.
InitProtobufDefaultsSlow:107done.
InitProtobufDefaultsSlow:103 call...
InitProtobufDefaultsImpl:91 call...
InitProtobufDefaultsImpl:98 done.
is_inited: 0x5641f8051b48.
InitProtobufDefaultsSlow:107done.
Running main() from /usr/local/google/home/corentinl/dev/repo/bazel-proto/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from FooTest
[ RUN ] FooTest.ProtoFunction
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1721369740.493581 753437 generated_message_reflection.cc:3622] Check failed: file != nullptr
*** Check failure stack trace: ***
zsh: IOT instruction ./build/bin/foo_test
note: same ODR violation prior to the patch (i.e. protobuf v26.1) but without the assert fail side effect thus it was unnoticed until now...
For google/or-tools we tried to upgrade from v26.1 (working) to v27.2 (broken)
note: release v27.0 and v27.1 are also broken...
Language: Python (native C++ module) i.e. pybind11 wrappers....
What operating system (Linux, Windows, ...) and version? Linux and MacOS are impacted...
What runtime / compiler are you using (e.g., python version or gcc version) From Python 3.9 to Python 3.12 gcc ~[10,14] see default version from almalinux:latest, rockylinux:9, fedora:latest, debian:latest, opensuse, ubuntu:latest .... ref: https://repology.org/project/gcc/badges. note: we are targeting C++17 (with agregate initialisation) or C++20 (MSVC)
What did you do? Please read this issue for a full detailed bug report:
What did you expect to see all or-tools tests pass as before...
What did you see instead? all python tests using pybind11 and protobuf failed with error:
Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs). Again, Please read the issue for a full detailed bug report:
Anything else we should know about your project / environment using kind of bissect on protobuf tag use in our build, I figure out it is these commits (split in two parts
-_-
) are at fault:For Googlers: it is cl/602725967
note: git revert both commit on top of v27.2 tag was not working (too many conflict to make it...)