ralna / spral

Sparse Parallel Robust Algorithms Library
https://ralna.github.io/spral/
Other
104 stars 27 forks source link

Segmentation fault on large linear systems #218

Open amontoison opened 1 month ago

amontoison commented 1 month ago

For JuMP-dev 2024, I wanted to give the elapsed time to solve a very large optimization problem with Ipopt and different linear solvers (MUMPS, SPRAL, MA27, MA57) but I got a segmentation fault for SPRAL:

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.14.14, running with linear solver spral.

Number of nonzeros in equality constraint Jacobian...:  3194444
Number of nonzeros in inequality constraint Jacobian.:   756090
Number of nonzeros in Lagrangian Hessian.............:  5708220

[2075969] signal (11.1): Erreur de segmentation
in expression starting at /home/montalex/JuMP-dev/jump.jl:24
_ZN5spral5ssids3cpu25assemble_expected_contribIdNS1_14BuddyAllocatorIdSaIdEEEPiEEviiRNS1_11NumericNodeIT_T0_EERKSA_RKT1_S6_ at /home/montalex/.julia/artifacts/62583b2b732b56db59087f82f543fc544608546f/lib/libspral.so (unknown line)
GOMP_task at /workspace/srcdir/gcc-13.2.0/libgomp/task.c:584
_ZN5spral5ssids3cpu13assemble_postIdNS1_14BuddyAllocatorIdSaIdEEEEEviRKNS1_12SymbolicNodeEPPvRNS1_11NumericNodeIT_T0_EERSD_RSt6vectorINS1_9WorkspaceESaISI_EE at /home/montalex/.julia/artifacts/62583b2b732b56db59087f82f543fc544608546f/lib/libspral.so (unknown line)
_ZN5spral5ssids3cpu14NumericSubtreeILb0EdLm8388608ENS1_11AppendAllocIdEEEC2ERKNS1_15SymbolicSubtreeEPKdSA_PPvRKNS1_18cpu_factor_optionsERNS1_11ThreadStatsE._omp_fn.1 at /home/montalex/.julia/artifacts/62583b2b732b56db59087f82f543fc544608546f/lib/libspral.so (unknown line)
GOMP_task at /workspace/srcdir/gcc-13.2.0/libgomp/task.c:584
_ZN5spral5ssids3cpu14NumericSubtreeILb0EdLm8388608ENS1_11AppendAllocIdEEEC1ERKNS1_15SymbolicSubtreeEPKdSA_PPvRKNS1_18cpu_factor_optionsERNS1_11ThreadStatsE at /home/montalex/.julia/artifacts/62583b2b732b56db59087f82f543fc544608546f/lib/libspral.so (unknown line)
spral_ssids_cpu_create_num_subtree_dbl at /home/montalex/.julia/artifacts/62583b2b732b56db59087f82f543fc544608546f/lib/libspral.so (unknown line)
__spral_ssids_cpu_subtree_MOD_factor at /home/montalex/.julia/artifacts/62583b2b732b56db59087f82f543fc544608546f/lib/libspral.so (unknown line)
__spral_ssids_fkeep_MOD_inner_factor_cpu._omp_fn.2 at /home/montalex/.julia/artifacts/62583b2b732b56db59087f82f543fc544608546f/lib/libspral.so (unknown line)
GOMP_taskgroup_end at /workspace/srcdir/gcc-13.2.0/libgomp/task.c:2330
__spral_ssids_fkeep_MOD_inner_factor_cpu._omp_fn.1 at /home/montalex/.julia/artifacts/62583b2b732b56db59087f82f543fc544608546f/lib/libspral.so (unknown line)
GOMP_parallel at /workspace/srcdir/gcc-13.2.0/libgomp/parallel.c:178
__spral_ssids_fkeep_MOD_inner_factor_cpu._omp_fn.0 at /home/montalex/.julia/artifacts/62583b2b732b56db59087f82f543fc544608546f/lib/libspral.so (unknown line)
GOMP_parallel at /workspace/srcdir/gcc-13.2.0/libgomp/parallel.c:178
__spral_ssids_fkeep_MOD_inner_factor_cpu at /home/montalex/.julia/artifacts/62583b2b732b56db59087f82f543fc544608546f/lib/libspral.so (unknown line)
__spral_ssids_MOD_ssids_factor_ptr64_double at /home/montalex/.julia/artifacts/62583b2b732b56db59087f82f543fc544608546f/lib/libspral.so (unknown line)
__spral_ssids_MOD_ssids_factor_ptr32_double at /home/montalex/.julia/artifacts/62583b2b732b56db59087f82f543fc544608546f/lib/libspral.so (unknown line)
spral_ssids_factor_ptr32 at /home/montalex/.julia/artifacts/62583b2b732b56db59087f82f543fc544608546f/lib/libspral.so (unknown line)
_ZN5Ipopt20SpralSolverInterface10MultiSolveEbPKiS2_iPdbi at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt16TSymLinearSolver10MultiSolveERKNS_9SymMatrixERSt6vectorINS_8SmartPtrIKNS_6VectorEEESaIS8_EERS4_INS5_IS6_EESaISC_EEbi at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt18StdAugSystemSolver10MultiSolveEPKNS_9SymMatrixEdPKNS_6VectorEdS6_dPKNS_6MatrixES6_dS9_S6_dRSt6vectorINS_8SmartPtrIS5_EESaISC_EESF_SF_SF_RSA_INSB_IS4_EESaISG_EESJ_SJ_SJ_bi at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt15AugSystemSolver5SolveEPKNS_9SymMatrixEdPKNS_6VectorEdS6_dPKNS_6MatrixES6_dS9_S6_dRS5_SA_SA_SA_RS4_SB_SB_SB_bi at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt22LeastSquareMultipliers20CalculateMultipliersERNS_6VectorES2_ at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt25DefaultIterateInitializer18least_square_multsERKNS_10JournalistERNS_8IpoptNLPERNS_9IpoptDataERNS_25IpoptCalculatedQuantitiesERKNS_8SmartPtrINS_22EqMultiplierCalculatorEEEd at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt25DefaultIterateInitializer18SetInitialIteratesEv at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt14IpoptAlgorithm18InitializeIteratesEv at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt14IpoptAlgorithm8OptimizeEb at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt16IpoptApplication13call_optimizeEv at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt16IpoptApplication11OptimizeNLPERKNS_8SmartPtrINS_3NLPEEERNS1_INS_16AlgorithmBuilderEEE at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt16IpoptApplication11OptimizeNLPERKNS_8SmartPtrINS_3NLPEEE at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt16IpoptApplication12OptimizeTNLPERKNS_8SmartPtrINS_4TNLPEEE at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
IpoptSolve at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
IpoptSolve at /home/montalex/.julia/packages/Ipopt/bqp63/src/C_wrapper.jl:442
#solve!#7 at /home/montalex/.julia/packages/NLPModelsIpopt/0YgvC/src/NLPModelsIpopt.jl:240
solve! at /home/montalex/.julia/packages/NLPModelsIpopt/0YgvC/src/NLPModelsIpopt.jl:161 [inlined]
#ipopt#6 at /home/montalex/.julia/packages/NLPModelsIpopt/0YgvC/src/NLPModelsIpopt.jl:158 [inlined]
ipopt at /home/montalex/.julia/packages/NLPModelsIpopt/0YgvC/src/NLPModelsIpopt.jl:155
unknown function (ip: 0x7f41f4805d59)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:617
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
include_string at ./loading.jl:2076
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
_include at ./loading.jl:2136
include at ./client.jl:489
unknown function (ip: 0x7f41e41fa395)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:617
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
eval_user_input at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
repl_backend_loop at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
#start_repl_backend#46 at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
start_repl_backend at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:228
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
#run_repl#59 at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
run_repl at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
jfptr_run_repl_91737.1 at /home/montalex/Applications/julia/julia-1.10.4/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
#1013 at ./client.jl:432
jfptr_YY.1013_82703.1 at /home/montalex/Applications/julia/julia-1.10.4/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_latest at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:892 [inlined]
invokelatest at ./essentials.jl:889 [inlined]
run_main_repl at ./client.jl:416
exec_options at ./client.jl:333
_start at ./client.jl:552
jfptr__start_82729.1 at /home/montalex/Applications/julia/julia-1.10.4/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at julia (unknown line)
__libc_start_call_main at /lib64/libc.so.6 (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 167151010 (Pool: 167122640; Big: 28370); GC: 65
Erreur de segmentation (core dumped)

The culprit seems to be this function.

jfowkes commented 1 month ago

@amontoison I have no idea what that function even does 😢 Can you run it through the address sanitizer?

amontoison commented 1 month ago

@jfowkes I compiled SPRAL with --buildtype=debug tonight and I have these following errors:

jfowkes commented 1 month ago

@mjacobse do you have any idea what’s going on here or how we could best debug this?

mjacobse commented 1 month ago

Does it also segfault when running SSIDS serially? I.e. build without OpenMP or set OMP_NUM_THREADS to 1?

A way to reproduce this would be helpful, perhaps the offending matrix can be exported as .rb file?