google / or-tools

Google's Operations Research tools:
https://developers.google.com/optimization/
Apache License 2.0
11.12k stars 2.11k forks source link

CP-SAT Occasional Segmentation Fault #4400

Closed willf-j closed 1 day ago

willf-j commented 4 days ago

What version of OR-Tools and what language are you using? Version: 9.11.4210 Language: Python

Which solver are you using (e.g. CP-SAT, Routing Solver, GLOP, BOP, Gurobi) CP-SAT

What operating system (Linux, Windows, ...) and version? Linux; Ubuntu 22.04

What did you do? Run the basic python script to load and run the model

What did you expect to see Model runs without issue; will converge after a few minutes to an objective around 30000. Certificate of optimality takes rather a lot longer on this model

What did you see instead? Segmentation fault within the first ~20 seconds after presolve. Happens about 10% of the time.

Make sure you include information that can help us debug (full error message, model Proto).

user$ python run_model.py cp_sat_model.proto
Model loaded successfully from cp_sat_model.proto

Starting CP-SAT solver v9.11.4210
Parameters: log_search_progress: true
Setting number of workers to 32
.
.
.
#Bound   9.43s best:inf   next:[19048,10000000] fs_random
#Bound   9.54s best:inf   next:[19088,10000000] fs_random
#Bound   9.66s best:inf   next:[19116,10000000] fs_random
#Bound  11.17s best:inf   next:[19130,10000000] fs_random
#Model  11.99s var:2265/2267 constraints:17457/17458
#Model  13.12s var:2263/2267 constraints:17456/17458
#Bound  13.74s best:inf   next:[20369,10000000] objective_lb_search
#1      14.15s best:55582 next:[20369,55581] rens_lp_lns (d=0.97 s=772 t=0.10 p=1.00 stall=0 h=auto_l0)
Segmentation fault (core dumped)

crashing_model.zip

Anything else we should know about your project / environment 13th Gen Intel® Core™ i9-13900K × 32

willf-j commented 4 days ago

@lperron I think it may be easier to repro if you set solver.parameters.use_lns_only = True

lperron commented 1 day ago

I ran it for 3 days on main. No crash. I will close it.

willf-j commented 1 day ago

@lperron my apologies I should have been more specific in the ticket. The issue will either happen in the first 20 seconds or it won't happen at all. When you ran it for 3 days, was it constantly restarting or was it just running?

Here's the stacktrace I get when I manage to repro:

Thread 1 "sat_runner" received signal SIGSEGV, Segmentation fault.
_int_free (av=0x7fff40000030, p=0x7fff414429e0, have_lock=<optimized out>) at ./malloc/malloc.c:4590
4590    ./malloc/malloc.c: No such file or directory.
(gdb) bt
#0  _int_free (av=0x7fff40000030, p=0x7fff414429e0, have_lock=<optimized out>) at ./malloc/malloc.c:4590
#1  0x00007ffff52a5453 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#2  0x00007ffff6777326 in operations_research::sat::FeasibilityJumpSolver::~FeasibilityJumpSolver() () from /home/f-j/Downloads/or-tools/build/bin/../lib/libortools.so.9
#3  0x00007ffff6777609 in operations_research::sat::FeasibilityJumpSolver::~FeasibilityJumpSolver() () from /home/f-j/Downloads/or-tools/build/bin/../lib/libortools.so.9
#4  0x00007ffff68e939f in operations_research::sat::(anonymous namespace)::ClearSubsolversThatAreDone(std::vector<int, std::allocator<int> > const&, std::vector<std::unique_ptr<operations_research::sat::SubSolver, std::default_delete<operations_research::sat::SubSolver> >, std::allocator<std::unique_ptr<operations_research::sat::SubSolver, std::default_delete<operations_research::sat::SubSolver> > > >&) () from /home/f-j/Downloads/or-tools/build/bin/../lib/libortools.so.9
#5  0x00007ffff68ea3f5 in operations_research::sat::NonDeterministicLoop(std::vector<std::unique_ptr<operations_research::sat::SubSolver, std::default_delete<operations_research::sat::SubSolver> >, std::allocator<std::unique_ptr<operations_research::sat::SubSolver, std::default_delete<operations_research::sat::SubSolver> > > >&, int) () from /home/f-j/Downloads/or-tools/build/bin/../lib/libortools.so.9
#6  0x00007ffff66c06a2 in operations_research::sat::(anonymous namespace)::LaunchSubsolvers(operations_research::sat::SatParameters const&, operations_research::sat::SharedClasses*, std::vector<std::unique_ptr<operations_research::sat::SubSolver, std::default_delete<operations_research::sat::SubSolver> >, std::allocator<std::unique_ptr<operations_research::sat::SubSolver, std::default_delete<operations_research::sat::SubSolver> > > >&, absl::lts_20240722::Span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const>) () from /home/f-j/Downloads/or-tools/build/bin/../lib/libortools.so.9
#7  0x00007ffff66be9a5 in operations_research::sat::(anonymous namespace)::SolveCpModelParallel(operations_research::sat::SharedClasses*, operations_research::sat::Model*) () from /home/f-j/Downloads/or-tools/build/bin/../lib/libortools.so.9
#8  0x00007ffff66b8a12 in operations_research::sat::SolveCpModel(operations_research::sat::CpModelProto const&, operations_research::sat::Model*) () from /home/f-j/Downloads/or-tools/build/bin/../lib/libortools.so.9
#9  0x000055555556e5c4 in main ()
(gdb) 
lperron commented 1 day ago

I ran it manually around 100 times. It never crashed. And I know we fixed a few crashes post 9.11.

I will continue trying. Laurent Perron | Operations Research | @.*** | (33) 1 42 68 53 00

Le mar. 8 oct. 2024 à 18:04, Will Floyd-Jones @.***> a écrit :

@lperron https://github.com/lperron my apologies I should have been more specific in the ticket. The issue will either happen in the first 20 seconds or it won't happen at all. When you ran it for 3 days, was it constantly restarting or was it just running?

Here's the stacktrace I get when I manage to repro:

Thread 1 "sat_runner" received signal SIGSEGV, Segmentation fault. _int_free (av=0x7fff40000030, p=0x7fff414429e0, have_lock=) at ./malloc/malloc.c:4590 4590 ./malloc/malloc.c: No such file or directory. (gdb) bt

0 _int_free (av=0x7fff40000030, p=0x7fff414429e0, have_lock=) at ./malloc/malloc.c:4590

1 0x00007ffff52a5453 in __GI___libc_free (mem=) at ./malloc/malloc.c:3391

2 0x00007ffff6777326 in operations_research::sat::FeasibilityJumpSolver::~FeasibilityJumpSolver() () from /home/f-j/Downloads/or-tools/build/bin/../lib/libortools.so.9

3 0x00007ffff6777609 in operations_research::sat::FeasibilityJumpSolver::~FeasibilityJumpSolver() () from /home/f-j/Downloads/or-tools/build/bin/../lib/libortools.so.9

4 0x00007ffff68e939f in operations_research::sat::(anonymous namespace)::ClearSubsolversThatAreDone(std::vector<int, std::allocator > const&, std::vector<std::unique_ptr<operations_research::sat::SubSolver, std::default_delete >, std::allocator<std::unique_ptr<operations_research::sat::SubSolver, std::default_delete > > >&) () from /home/f-j/Downloads/or-tools/build/bin/../lib/libortools.so.9

5 0x00007ffff68ea3f5 in operations_research::sat::NonDeterministicLoop(std::vector<std::unique_ptr<operations_research::sat::SubSolver, std::default_delete >, std::allocator<std::unique_ptr<operations_research::sat::SubSolver, std::default_delete > > >&, int) () from /home/f-j/Downloads/or-tools/build/bin/../lib/libortools.so.9

6 0x00007ffff66c06a2 in operations_research::sat::(anonymous namespace)::LaunchSubsolvers(operations_research::sat::SatParameters const&, operations_research::sat::SharedClasses*, std::vector<std::unique_ptr<operations_research::sat::SubSolver, std::default_delete >, std::allocator<std::unique_ptr<operations_research::sat::SubSolver, std::default_delete > > >&, absl::lts_20240722::Span<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const>) () from /home/f-j/Downloads/or-tools/build/bin/../lib/libortools.so.9

7 0x00007ffff66be9a5 in operations_research::sat::(anonymous namespace)::SolveCpModelParallel(operations_research::sat::SharedClasses, operations_research::sat::Model) () from /home/f-j/Downloads/or-tools/build/bin/../lib/libortools.so.9

8 0x00007ffff66b8a12 in operations_research::sat::SolveCpModel(operations_research::sat::CpModelProto const&, operations_research::sat::Model*) () from /home/f-j/Downloads/or-tools/build/bin/../lib/libortools.so.9

9 0x000055555556e5c4 in main ()

(gdb)

— Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/4400#issuecomment-2400249759, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUPL3NZXE7W2EILXL5UXKDZ2P675AVCNFSM6AAAAABPM7TWCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBQGI2DSNZVHE . You are receiving this because you were mentioned.Message ID: @.***>