google / or-tools

Google's Operations Research tools:
https://developers.google.com/optimization/
Apache License 2.0
10.96k stars 2.1k forks source link

[cmake] Using Protobuf v27 make python pybind11 test failed #4307

Open Mizux opened 1 month ago

Mizux commented 1 month ago

Tests impacted

here the list of impacted tests...

To test:

cmake -S. -Bbuild -DBUILD_PYTHON=ON -DBUILD_CXX_SAMPLES=OFF -DBUILD_CXX_EXAMPLES=OFF
cmake --build build -j 8
(cd build && ctest -R "python_")
...
96% tests passed, 13 tests failed out of 332

Total Test time (real) = 349.67 sec

The following tests FAILED:
    11 - python_pdlp_pdlp_test (Subprocess aborted)
    14 - python_routing_model_test (Subprocess aborted)
    19 - python_scheduling_rcpsp_test (Subprocess aborted)
    21 - python_math_opt_solver_test (Subprocess aborted)
    170 - python_math_opt_advanced_linear_programming (Subprocess aborted)
    171 - python_math_opt_basic_example (Subprocess aborted)
    172 - python_math_opt_cutting_stock (Subprocess aborted)
    173 - python_math_opt_integer_programming (Subprocess aborted)
    174 - python_math_opt_linear_programming (Subprocess aborted)
    175 - python_math_opt_linear_regression (Subprocess aborted)
    176 - python_math_opt_time_indexed_scheduling (Subprocess aborted)
    178 - python_pdlp_simple_pdlp_program (Subprocess aborted)
    439 - python_python_rcpsp_sat (Subprocess aborted)
Errors while running CTest
Output from these tests are in: /usr/local/google/home/corentinl/work/main/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

note: using Protobuf 58baeb4c3b664f8918d24cef5151083d9da9767c everything pass... Regression seem to have been introduced by http://cl/602725967

Investigation

Testing one broken test

cmake -S. -Bbuild -DBUILD_PYTHON=ON -DBUILD_CXX_SAMPLES=OFF -DBUILD_CXX_EXAMPLES=OFF
cmake --build build -j 8
source build/python/venv/bin/activate
python ortools/routing/python/model_test.py

Trace:

Running tests under Python 3.12.2: /usr/local/google/home/corentinl/work/main/build/python/venv/bin/python
[ RUN      ] ModelTest.testConstantDimensionTSP
[       OK ] ModelTest.testConstantDimensionTSP
[ RUN      ] ModelTest.testCtor
<ortools.routing.python.model.RoutingModel object at 0x7faea8f9d4b0>
[       OK ] ModelTest.testCtor
[ RUN      ] ModelTest.testDimensionTSP
[       OK ] ModelTest.testDimensionTSP
[ RUN      ] ModelTest.testDimensionWithVehicleCapacitiesTSP
[       OK ] ModelTest.testDimensionWithVehicleCapacitiesTSP
[ RUN      ] ModelTest.testDimensionWithVehicleTransitsTSP
[       OK ] ModelTest.testDimensionWithVehicleTransitsTSP
[ RUN      ] ModelTest.testDimensionWithVehicleTransitsVRP
[       OK ] ModelTest.testDimensionWithVehicleTransitsVRP
[ RUN      ] ModelTest.testDisjunctionPenaltyTSP
[       OK ] ModelTest.testDisjunctionPenaltyTSP
[ RUN      ] ModelTest.testDisjunctionTSP
[       OK ] ModelTest.testDisjunctionTSP
[ RUN      ] ModelTest.testMatrixDimensionTSP
[       OK ] ModelTest.testMatrixDimensionTSP
[ RUN      ] ModelTest.testMatrixDimensionVRP
[       OK ] ModelTest.testMatrixDimensionVRP
[ RUN      ] ModelTest.testRoutingLocalSearchFiltering
First solution statistics:
                  | Time (s)
PATH_CHEAPEST_ARC | 3.3e-05
Local search filter statistics (PATH_CHEAPEST_ARC):
                      |     Calls |   Rejects | Time (s) | Rejects/s
VehicleVariableFilter |        12 |         0 | 8.3e-06  |       0
 VariableDomainFilter |        12 |         0 | 7.8e-06  |       0
                Total |        24 |         0 | 1.6e-05  |       0

[       OK ] ModelTest.testRoutingLocalSearchFiltering
[ RUN      ] ModelTest.testRoutingModelParameters
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1720594007.067703 3589844 generated_message_reflection.cc:3620] Check failed: file != nullptr 
*** Check failure stack trace: ***
Fatal Python error: Aborted

Current thread 0x00007faeaa0d0040 (most recent call first):
  File "/usr/local/google/home/corentinl/work/main/ortools/routing/python/model_test.py", line 650 in testRoutingModelParameters
  File "/usr/lib/python3.12/unittest/case.py", line 589 in _callTestMethod
  File "/usr/lib/python3.12/unittest/case.py", line 634 in run
  File "/usr/lib/python3.12/unittest/case.py", line 690 in __call__
  File "/usr/lib/python3.12/unittest/suite.py", line 122 in run
  File "/usr/lib/python3.12/unittest/suite.py", line 84 in __call__
  File "/usr/lib/python3.12/unittest/suite.py", line 122 in run
  File "/usr/lib/python3.12/unittest/suite.py", line 84 in __call__
  File "/usr/lib/python3.12/unittest/runner.py", line 240 in run
  File "/usr/local/google/home/corentinl/work/main/build/python/venv/lib/python3.12/site-packages/absl/testing/_pretty_print_reporter.py", line 82 in run
  File "/usr/lib/python3.12/unittest/main.py", line 281 in runTests
  File "/usr/lib/python3.12/unittest/main.py", line 105 in __init__
  File "/usr/local/google/home/corentinl/work/main/build/python/venv/lib/python3.12/site-packages/absl/testing/absltest.py", line 2653 in _run_and_get_tests_result
  File "/usr/local/google/home/corentinl/work/main/build/python/venv/lib/python3.12/site-packages/absl/testing/absltest.py", line 2689 in run_tests
  File "/usr/local/google/home/corentinl/work/main/build/python/venv/lib/python3.12/site-packages/absl/testing/absltest.py", line 2234 in main_function
  File "/usr/local/google/home/corentinl/work/main/build/python/venv/lib/python3.12/site-packages/absl/app.py", line 254 in _run_main
  File "/usr/local/google/home/corentinl/work/main/build/python/venv/lib/python3.12/site-packages/absl/app.py", line 308 in run
  File "/usr/local/google/home/corentinl/work/main/build/python/venv/lib/python3.12/site-packages/absl/testing/absltest.py", line 2236 in _run_in_app
  File "/usr/local/google/home/corentinl/work/main/build/python/venv/lib/python3.12/site-packages/absl/testing/absltest.py", line 2131 in main
  File "/usr/local/google/home/corentinl/work/main/ortools/routing/python/model_test.py", line 675 in <module>

Extension modules: google._upb._message (total: 1)
zsh: IOT instruction  python ortools/routing/python/model_test.py

https://github.com/google/or-tools/blob/f9adb26d1df687ce2362e864ee713f2db00897bc/ortools/routing/python/model_test.py#L648-L651

Protobuf Version Python Test
v26.1 Pass
v27.0 Failed
v27.1 Failed
v27.2 Failed

continue to investigate...

ref: https://github.com/protocolbuffers/protobuf/blob/63def39e881afa496502d9c410f4ea948e59490d/src/google/protobuf/generated_message_reflection.cc#L3616-L3620

Mizux commented 1 month ago

Custom Protobuf

cloning Protobuf locally and playing with SHA1/patch

git clone git@github.com:protocolbuffers/protobuf.git
cd protobuf
# reset to the commit to test
git reset --hard d21425d334f965650c3c66362b0d827797f059db

# Change -dev version suffix using v27.0 commit otherwise at runtime protobuf check version suffix and refuse to start
# we use tig to select the commit to cherry pick
tig v27.0
# inside tig press `C` to cherrypick, `Y` to confirm, then quit with `q`
# fix patch using mergetool
git mergetool
# remove .orig
git clean -n
git clean -f

# Patch CMake stuff
# note: could be protobuf-v27.patch
git apply --3way ~/work/main/patches/protobuf-v26.patch

modify or-tools

diff --git a/cmake/dependencies/CMakeLists.txt b/cmake/dependencies/CMakeLists.txt
index a5cfdeee13..fbd75fe544 100644
--- a/cmake/dependencies/CMakeLists.txt
+++ b/cmake/dependencies/CMakeLists.txt
@@ -106,13 +106,21 @@ if(BUILD_Protobuf)
   #set(protobuf_BUILD_LIBUPB ON)
   FetchContent_Declare(
     Protobuf
-    GIT_REPOSITORY "https://github.com/protocolbuffers/protobuf.git"
+    SOURCE_DIR "${CMAKE_SOURCE_DIR}/pb"

-    GIT_SHALLOW TRUE
+    #GIT_SHALLOW TRUE
     GIT_SUBMODULES ""

-    PATCH_COMMAND git apply --ignore-whitespace "${CMAKE_CURRENT_LIST_DIR}/../../patches/protobuf-v27.patch")
+    #PATCH_COMMAND git apply --ignore-whitespace "${CMAKE_CURRENT_LIST_DIR}/../../patches/protobuf-v27.patch"
+    #PATCH_COMMAND git apply --ignore-whitespace "${CMAKE_CURRENT_LIST_DIR}/../../patches/protobuf-v26.patch"
+  )
   FetchContent_MakeAvailable(Protobuf)
   list(POP_BACK CMAKE_MESSAGE_INDENT)
   message(CHECK_PASS "fetched")

note: I've use a symbolic link in the root dir of my or-tools local repo. ln -s <path_to_hacked_protobuf> pb

In or-tools:

# clean protobuf build
rm -rf build/_deps/protobuf-build build/_deps/protobuf-src build/_deps/protobuf-subbuild 
# rebuild
cmake --build build  -j 8

# test
source build/python/venv/bin/activate
python ortools/routing/python/model_test.py

Dev Note

Rename the 'includingDefaultValueWithoutPresenceFields' and 'always_print_without_presence_fields' to 'alwaysPrintFieldsWithNoPresence':

You also, may need to revert this commit e0a4dcf5a082e7f90b73708fc7ff4a5e4760ed85 or apply this diff

diff --git a/ortools/util/file_util.cc b/ortools/util/file_util.cc
index db638ca8f9..fcb661fcf1 100644
--- a/ortools/util/file_util.cc
+++ b/ortools/util/file_util.cc
@@ -165,7 +165,7 @@ absl::Status WriteProtoToFile(absl::string_view filename,
     case ProtoWriteFormat::kJson: {
       google::protobuf::util::JsonPrintOptions options;
       options.add_whitespace = true;
-      options.always_print_fields_with_no_presence = true;
+      options.always_print_primitive_fields = true;
       options.preserve_proto_field_names = true;
       if (!google::protobuf::util::MessageToJsonString(proto, &output_string,
                                                        options)

if you are before this commit:

Fun Fact: While release v26.1 contains this patch, it is NOT available in the v26-dev tag... https://github.com/protocolbuffers/protobuf/blob/v26.1/src/google/protobuf/json/json.h#L46 https://github.com/protocolbuffers/protobuf/blob/v26-dev/src/google/protobuf/json/json.h#L46

See only commit between a and b[

To only list commit [a, b[ you can use:

tig a...b

Results

Using a bisect approach to find the first borken commit(s)...

Protobuf Version CMake Patch Test Date
v27.2 protobuf-v27.patch FAIL 2024
v27.1 protobuf-v27.patch FAIL 2024
v27.0 protobuf-v27.patch FAIL 2024/05/22
... ... ... ...
<v27-dev> protobuf-v27.patch FAIL 2024/04/17
... ... ... ...
01312f9c34a779f2a4e3327822b33d85f7a64002 protobuf-v26.patch FAIL 2024/02/06
... ... ... ...
4687ef335f29736fa0ff83d6421ea2bbdbdf1aa5 protobuf-v26.patch FAIL 2024/02/01
... ... ... ...
21ab7459ee43c0d5079e9f7708d453f68a5fd0e5 protobuf-v26.patch FAIL 2024/01/31
... ... ... ...
6f1d88107f268b8ebdad6690d116e74c403e366e protobuf-v26.patch FAIL 2024/01/30
... ... ... ...
4df6e042eca08159ee395853d8c250469a35db72 protobuf-v26.patch FAIL 2024/01/30
... ... ... ...
f4511fda5a0ad44e84848077da83f2ae9b21830e protobuf-v26.patch FAIL 2024/01/30
... ... ... ...
49c83ab799977592744f50df93957ddec1e12e70 protobuf-v26.patch FAIL 2024/01/30
abc9bae7c19d9fff4b0be0ffd3a01f947d3b487a protobuf-v26.patch NA(0) 2024/01/30
58baeb4c3b664f8918d24cef5151083d9da9767c protobuf-v26.patch PASS 2024/01/30
... ... ... ...
d21425d334f965650c3c66362b0d827797f059db protobuf-v26.patch PASS 2024/01/30
... ... ... ...
9146ce6ddba3b74ad16a0f440f9cf9d73ba282c7 protobuf-v26.patch PASS 2024/01/25
... ... ... ...
<v26-dev> protobuf-v26.patch NA(1)
  1. https://github.com/protocolbuffers/protobuf/commit/49c83ab799977592744f50df93957ddec1e12e70 contains auto generated code after the commit https://github.com/protocolbuffers/protobuf/commit/abc9bae7c19d9fff4b0be0ffd3a01f947d3b487a so I guess this one is not atomic and can't be tested...
  2. compile fail didn't manage to generate a binary, but v26-dev is not in the v26.1 which contains commit after this tag
Mizux commented 1 month ago

TLDR: using Protobuf (cmake based build) as a "Static" library will lead to an ODR violation, until now this ODR was without impact/side effect but the Protobuf change this behaviour and make it crash the runtime.

magneticflux- commented 1 month ago

Also seems to break fzn-cp-sat:

F0801 20:51:14.314146 2793569 generated_message_reflection.cc:3620] Check failed: file != nullptr