MUMPS_seq_jll@5.6.2 is slower than MUMPS_seq_jll@5.6.1

DoctorDro commented 5 months ago

Dear all, this is very easy to verify. On a clean installation of Julia type:

pkg> add PowerModels Ipopt PGLib
julia> using Ipopt,PowerModels,PGLib

For more information on where the time is spent add a single line in the ipopt.opt file and save it:

print_timing_statistics       yes

julia> solve_ac_opf(pglib(find_pglib_case("3012")[1]), Ipopt.Optimizer)
pkg> remove Ipopt
pkg> add Ipopt@1.5.1

Close and restart Julia for the changes to be applied and then try to solve the same problem again

julia> using Ipopt,PowerModels,PGLib
julia> solve_ac_opf(pglib(find_pglib_case("3012")[1]), Ipopt.Optimizer)

odow commented 5 months ago

Can you please post the full output of each run, with the timing statistics.

Also, what is the output of versioninfo() and ] st -m

DoctorDro commented 5 months ago

(@v1.10) pkg> status
Status `~/.julia/environments/v1.10/Project.toml`
  [6e4b80f9] BenchmarkTools v1.4.0
⌃ [b6b21f68] Ipopt v1.5.1
  [4076af6c] JuMP v1.18.1
  [1ec41992] MosekTools v0.15.1
  [76087f3c] NLopt v1.0.1
  [c36e90e8] PowerModels v0.21.0
  [6dd1b50a] Tulip v0.9.6
Info Packages marked with ⌃ have new versions available and may be upgradable.

julia> solve_opf("/home/Dr2Dr/pglib-opf/pglib_opf_case3012wp_k.m", ACPPowerModel, Ipopt.Optimizer)
This is Ipopt version 3.14.13, running with linear solver MUMPS 5.6.1.

Number of nonzeros in equality constraint Jacobian...:    86381
Number of nonzeros in inequality constraint Jacobian.:    28552
Number of nonzeros in Lagrangian Hessian.............:   157177

Total number of variables............................:    20955
                     variables with only lower bounds:        0
                variables with lower and upper bounds:    17943
                     variables with only upper bounds:        0
Total number of equality constraints.................:    20313
Total number of inequality constraints...............:    14276
        inequality constraints with only lower bounds:     3566
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:    10710

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  1.6342652e+06 4.13e+01 1.00e+02  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1  1.6352129e+06 4.12e+01 9.98e+01  -1.0 6.04e+00    -  9.39e-03 2.00e-03f  1
   2  1.6151387e+06 4.07e+01 9.93e+01  -1.0 2.27e+01    -  3.60e-03 1.14e-02f  1
   3  1.5897754e+06 3.98e+01 9.77e+01  -1.0 2.66e+01    -  1.57e-02 2.25e-02f  1
   4  1.5602592e+06 3.84e+01 9.04e+01  -1.0 2.71e+01    -  8.27e-02 3.51e-02f  1
   5  1.5426129e+06 3.61e+01 7.96e+01  -1.0 2.70e+01    -  1.34e-01 6.19e-02f  1
   6  1.5421842e+06 3.14e+01 6.51e+01  -1.0 1.70e+01    -  2.00e-01 1.32e-01f  1
   7  1.5423050e+06 2.85e+01 5.81e+01  -1.0 3.46e+01    -  1.45e-01 9.30e-02h  1
   8  1.5425468e+06 2.69e+01 6.11e+01  -1.0 5.00e+01    -  2.14e-01 5.52e-02h  1
   9  1.5432835e+06 2.66e+01 6.48e+01  -1.0 4.92e+01    -  1.86e-02 1.13e-02h  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  10  1.5750113e+06 2.55e+01 4.66e+01  -1.0 1.10e+02    -  3.24e-04 4.01e-02h  1
  11  1.6163481e+06 2.44e+01 4.43e+01  -1.0 1.20e+02    -  3.56e-02 4.30e-02h  1
  12  1.6941949e+06 2.24e+01 7.14e+01  -1.0 6.65e+01    -  4.47e-02 8.32e-02h  1
  13  1.7918558e+06 1.95e+01 1.70e+02  -1.0 4.69e+01    -  1.03e-02 1.29e-01h  1
  14  1.8063216e+06 1.91e+01 1.56e+02  -1.0 5.27e+01    -  4.05e-02 2.15e-02h  1
  15  1.8633366e+06 1.74e+01 1.71e+02  -1.0 4.39e+01    -  3.63e-02 8.68e-02h  1
  16  1.8875877e+06 1.67e+01 1.62e+02  -1.0 4.06e+01    -  4.69e-02 3.93e-02h  1
  17  1.9232737e+06 1.58e+01 1.58e+02  -1.0 3.82e+01    -  3.47e-02 5.92e-02h  1
  18  1.9418463e+06 1.53e+01 1.54e+02  -1.0 2.78e+01    -  2.49e-02 3.07e-02h  1
  19  1.9760000e+06 1.44e+01 1.44e+02  -1.0 4.16e+01    -  5.38e-02 5.62e-02h  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  20  2.0209287e+06 1.33e+01 1.27e+02  -1.0 3.72e+01    -  3.11e-02 7.75e-02h  1
  21  2.0815969e+06 1.18e+01 1.02e+02  -1.0 2.66e+01    -  3.66e-02 1.13e-01h  1
  22  2.1156165e+06 1.10e+01 9.55e+01  -1.0 3.10e+01    -  8.38e-02 6.99e-02h  1
  23  2.1559266e+06 1.00e+01 8.75e+01  -1.0 2.58e+01    -  1.06e-01 8.81e-02h  1
  24  2.2131218e+06 8.66e+00 6.19e+01  -1.0 2.40e+01    -  7.26e-02 1.34e-01h  1
  25  2.2730030e+06 7.30e+00 4.26e+01  -1.0 1.61e+01    -  1.23e-01 1.57e-01h  1
  26  2.3244782e+06 6.15e+00 3.34e+01  -1.0 1.45e+01    -  1.69e-01 1.58e-01h  1
  27  2.3291338e+06 6.04e+00 1.26e+02  -1.0 1.50e+01    -  1.61e-01 1.68e-02h  1
  28  2.4005626e+06 4.46e+00 6.40e+02  -1.0 1.93e+01    -  1.04e-02 2.62e-01h  1
  29  2.4426920e+06 3.55e+00 5.49e+02  -1.0 1.03e+01    -  1.30e-01 2.03e-01h  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  30  2.4727769e+06 2.91e+00 4.47e+02  -1.0 9.88e+00    -  2.66e-01 1.80e-01h  1
  31  2.5080868e+06 2.17e+00 3.21e+02  -1.0 1.52e+01    -  6.39e-01 2.56e-01h  1
  32  2.5491044e+06 1.30e+00 1.94e+02  -1.0 1.47e+01    -  4.40e-01 3.97e-01h  1
  33  2.5757333e+06 7.54e-01 1.09e+02  -1.0 1.33e+01    -  3.52e-01 4.22e-01h  1
  34  2.6074006e+06 1.11e-01 2.57e+01  -1.0 9.48e+00    -  6.45e-01 8.51e-01h  1
  35  2.6130581e+06 7.60e-04 1.52e+02  -1.0 2.35e+00    -  5.77e-01 1.00e+00h  1
  36  2.6130927e+06 1.58e-05 2.59e+00  -1.0 6.43e-02    -  1.00e+00 1.00e+00h  1
  37  2.6130989e+06 7.91e-07 1.14e-02  -1.0 2.51e-02    -  1.00e+00 1.00e+00h  1
  38  2.6082439e+06 1.88e-03 6.31e+00  -2.5 3.45e+00    -  4.17e-01 5.65e-01f  1
  39  2.6060212e+06 2.09e-03 5.10e+00  -2.5 2.93e+00    -  3.44e-01 4.15e-01f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  40  2.6040791e+06 2.37e-03 3.53e+00  -2.5 2.69e+00    -  6.29e-01 4.89e-01f  1
  41  2.6034516e+06 2.03e-03 2.24e+01  -2.5 1.73e+00    -  8.37e-01 1.95e-01f  1
  42  2.6027684e+06 1.64e-03 1.66e+01  -2.5 1.56e+00    -  2.19e-01 2.44e-01f  1
  43  2.6020541e+06 2.58e-03 6.91e+01  -2.5 1.22e+00    -  7.79e-02 3.73e-01f  1
  44  2.6016382e+06 2.25e-03 4.47e+01  -2.5 5.80e-01    -  3.03e-01 3.80e-01f  1
  45  2.6014273e+06 1.51e-03 2.64e+01  -2.5 4.52e-01    -  9.57e-01 3.89e-01h  1
  46  2.6012506e+06 5.53e-04 8.50e+00  -2.5 3.77e-01    -  1.00e+00 6.33e-01h  1
  47  2.6011738e+06 8.96e-05 2.83e-03  -2.5 1.34e-01    -  1.00e+00 1.00e+00h  1
  48  2.6010775e+06 1.58e-04 1.29e+01  -3.8 5.50e-01    -  7.34e-01 2.71e-01f  1
  49  2.6009248e+06 4.91e-04 5.94e+00  -3.8 5.22e-01    -  6.47e-01 6.06e-01f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  50  2.6008801e+06 3.75e-04 1.26e+01  -3.8 3.33e-01    -  9.94e-01 5.84e-01h  1
  51  2.6008685e+06 2.38e-04 1.19e+01  -3.8 3.05e-01    -  1.00e+00 4.53e-01h  1
  52  2.6008559e+06 4.51e-05 1.36e-03  -3.8 1.90e-01    -  1.00e+00 1.00e+00h  1
  53  2.6008564e+06 2.64e-08 2.43e-05  -3.8 3.13e-02    -  1.00e+00 1.00e+00h  1
  54  2.6008467e+06 2.21e-05 2.23e+00  -5.7 2.36e-01    -  7.93e-01 6.95e-01f  1
  55  2.6008444e+06 1.06e-05 3.58e+00  -5.7 2.08e-01    -  8.19e-01 5.84e-01h  1
  56  2.6008429e+06 2.44e-06 7.08e-05  -5.7 1.28e-01    -  1.00e+00 1.00e+00h  1
  57  2.6008429e+06 4.68e-08 1.43e-06  -5.7 1.70e-02    -  1.00e+00 1.00e+00h  1
  58  2.6008429e+06 3.34e-09 1.36e-07  -5.7 4.20e-03    -  1.00e+00 1.00e+00h  1
  59  2.6008427e+06 2.73e-08 5.85e-02  -8.6 1.11e-02    -  9.21e-01 9.68e-01h  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  60  2.6008427e+06 4.52e-09 1.81e-07  -8.6 5.24e-03    -  1.00e+00 1.00e+00h  1
  61  2.6008427e+06 1.07e-09 4.08e-08  -8.6 1.48e-03    -  1.00e+00 1.00e+00h  1
  62  2.6008427e+06 1.08e-10 4.09e-09  -8.6 4.48e-04    -  1.00e+00 1.00e+00h  1

Number of Iterations....: 62

                                   (scaled)                 (unscaled)
Objective...............:   1.5940443252952067e+04    2.6008427211516593e+06
Dual infeasibility......:   4.0926264852153880e-09    6.6775293732774272e-07
Constraint violation....:   9.0460213650434405e-12    1.0798500982289738e-10
Variable bound violation:   4.9107871546993920e-08    4.9107871546993920e-08
Complementarity.........:   4.9205211314646256e-09    8.0283222780976834e-07
Overall NLP error.......:   4.9205211314646256e-09    8.0283222780976834e-07

Number of objective function evaluations             = 63
Number of objective gradient evaluations             = 63
Number of equality constraint evaluations            = 63
Number of inequality constraint evaluations          = 63
Number of equality constraint Jacobian evaluations   = 63
Number of inequality constraint Jacobian evaluations = 63
Number of Lagrangian Hessian evaluations             = 62
Total seconds in IPOPT (w/o function evaluations)    =      7.824
Total seconds in NLP function evaluations            =      5.791

Timing Statistics:

OverallAlgorithm....................:     13.483 (sys:      0.130 wall:     13.614)
 PrintProblemStatistics.............:      0.000 (sys:      0.000 wall:      0.000)
 InitializeIterates.................:      0.460 (sys:      0.000 wall:      0.460)
 UpdateHessian......................:      3.281 (sys:      0.020 wall:      3.301)
 OutputIteration....................:      0.005 (sys:      0.000 wall:      0.005)
 UpdateBarrierParameter.............:      0.012 (sys:      0.000 wall:      0.012)
 ComputeSearchDirection.............:      7.031 (sys:      0.110 wall:      7.141)
 ComputeAcceptableTrialPoint........:      2.112 (sys:      0.000 wall:      2.113)
 AcceptTrialPoint...................:      0.017 (sys:      0.000 wall:      0.017)
 CheckConvergence...................:      0.562 (sys:      0.000 wall:      0.562)
PDSystemSolverTotal.................:      7.010 (sys:      0.110 wall:      7.120)
 PDSystemSolverSolveOnce............:      6.745 (sys:      0.109 wall:      6.854)
 ComputeResiduals...................:      0.225 (sys:      0.000 wall:      0.226)
 StdAugSystemSolverMultiSolve.......:      6.960 (sys:      0.109 wall:      7.069)
 LinearSystemScaling................:      0.000 (sys:      0.000 wall:      0.000)
 LinearSystemSymbolicFactorization..:      0.166 (sys:      0.000 wall:      0.166)
 LinearSystemFactorization..........:      0.000 (sys:      0.000 wall:      0.000)
 LinearSystemBackSolve..............:      2.247 (sys:      0.001 wall:      2.248)
 LinearSystemStructureConverter.....:      0.000 (sys:      0.000 wall:      0.000)
  LinearSystemStructureConverterInit:      0.000 (sys:      0.000 wall:      0.000)
QualityFunctionSearch...............:      0.000 (sys:      0.000 wall:      0.000)
TryCorrector........................:      0.000 (sys:      0.000 wall:      0.000)
Task1...............................:      0.000 (sys:      0.000 wall:      0.000)
Task2...............................:      0.000 (sys:      0.000 wall:      0.000)
Task3...............................:      0.000 (sys:      0.000 wall:      0.000)
Task4...............................:      0.000 (sys:      0.000 wall:      0.000)
Task5...............................:      0.000 (sys:      0.000 wall:      0.000)
Task6...............................:      0.000 (sys:      0.000 wall:      0.000)
Function Evaluations................:      5.770 (sys:      0.020 wall:      5.791)
 Objective function.................:      0.001 (sys:      0.000 wall:      0.001)
 Objective function gradient........:      0.004 (sys:      0.000 wall:      0.004)
 Equality constraints...............:      1.968 (sys:      0.000 wall:      1.968)
 Inequality constraints.............:      0.034 (sys:      0.000 wall:      0.034)
 Equality constraint Jacobian.......:      0.016 (sys:      0.000 wall:      0.016)
 Inequality constraint Jacobian.....:      0.475 (sys:      0.000 wall:      0.476)
 Lagrangian Hessian.................:      3.272 (sys:      0.020 wall:      3.292)

EXIT: Optimal Solution Found.

(@v1.10) pkg> remove Ipopt
    Updating `~/.julia/environments/v1.10/Project.toml`
  [b6b21f68] - Ipopt v1.5.1
    Updating `~/.julia/environments/v1.10/Manifest.toml`
  [b6b21f68] - Ipopt v1.5.1
  [ae81ac8f] - ASL_jll v0.1.3+0
  [e33a78d0] - Hwloc_jll v2.10.0+0
  [9cc047cb] - Ipopt_jll v300.1400.1302+0
  [d00139f3] - METIS_jll v5.1.2+0
  [d7ed1dd3] - MUMPS_seq_jll v500.600.100+0
  [656ef2d0] - OpenBLAS32_jll v0.3.24+0
  [319450e9] - SPRAL_jll v2023.8.2+0

(@v1.10) pkg> add Ipopt
   Resolving package versions...
    Updating `~/.julia/environments/v1.10/Project.toml`
  [b6b21f68] + Ipopt v1.6.0
    Updating `~/.julia/environments/v1.10/Manifest.toml`
  [b6b21f68] + Ipopt v1.6.0
  [ae81ac8f] + ASL_jll v0.1.3+0
  [e33a78d0] + Hwloc_jll v2.10.0+0
  [9cc047cb] + Ipopt_jll v300.1400.1303+0
  [d00139f3] + METIS_jll v5.1.2+0
  [d7ed1dd3] + MUMPS_seq_jll v500.600.200+0
⌅ [656ef2d0] + OpenBLAS32_jll v0.3.24+0
⌅ [319450e9] + SPRAL_jll v2023.11.15+0
        Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m`
Precompiling project...
  ✓ MUMPS_seq_jll
  ✓ SPRAL_jll
  ✓ Ipopt_jll
  ✓ Ipopt
  4 dependencies successfully precompiled in 18 seconds. 77 already precompiled.
  4 dependencies precompiled but different versions are currently loaded. Restart julia to access the new versions

and after restarting julia

using Ipopt,PowerModels

(@v1.10) pkg> status
Status `~/.julia/environments/v1.10/Project.toml`
  [6e4b80f9] BenchmarkTools v1.4.0
  [b6b21f68] Ipopt v1.6.0
  [4076af6c] JuMP v1.18.1
  [1ec41992] MosekTools v0.15.1
  [76087f3c] NLopt v1.0.1
  [c36e90e8] PowerModels v0.21.0
  [6dd1b50a] Tulip v0.9.6

julia> PowerModels.silence()
[info | PowerModels]: Suppressing information and warning messages for the rest of this session.  Use the Memento package for more fine-grained control of logging.

julia> solve_opf("/home/Dr2Dr/pglib-opf/pglib_opf_case3012wp_k.m", ACPPowerModel, Ipopt.Optimizer)
This is Ipopt version 3.14.13, running with linear solver MUMPS 5.6.2.

Number of nonzeros in equality constraint Jacobian...:    86381
Number of nonzeros in inequality constraint Jacobian.:    21420
Number of nonzeros in Lagrangian Hessian.............:   157177

Total number of variables............................:    20955
                     variables with only lower bounds:        0
                variables with lower and upper bounds:    17943
                     variables with only upper bounds:        0
Total number of equality constraints.................:    20313
Total number of inequality constraints...............:    10710
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:     3566
        inequality constraints with only upper bounds:     7144

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  1.6342652e+06 4.13e+01 1.00e+02  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1  1.6352129e+06 4.12e+01 9.98e+01  -1.0 6.04e+00    -  9.39e-03 2.00e-03f  1
   2  1.6151387e+06 4.07e+01 9.93e+01  -1.0 2.27e+01    -  3.60e-03 1.14e-02f  1
   3  1.5897754e+06 3.98e+01 9.77e+01  -1.0 2.66e+01    -  1.57e-02 2.25e-02f  1
   4  1.5602592e+06 3.84e+01 9.04e+01  -1.0 2.71e+01    -  8.27e-02 3.51e-02f  1
   5  1.5426129e+06 3.61e+01 7.96e+01  -1.0 2.70e+01    -  1.34e-01 6.19e-02f  1
   6  1.5421842e+06 3.14e+01 6.51e+01  -1.0 1.70e+01    -  2.00e-01 1.32e-01f  1
   7  1.5423050e+06 2.85e+01 5.81e+01  -1.0 3.46e+01    -  1.45e-01 9.30e-02h  1
   8  1.5425468e+06 2.69e+01 6.11e+01  -1.0 5.00e+01    -  2.14e-01 5.52e-02h  1
   9  1.5432835e+06 2.66e+01 6.48e+01  -1.0 4.92e+01    -  1.86e-02 1.13e-02h  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  10  1.5750113e+06 2.55e+01 4.66e+01  -1.0 1.10e+02    -  3.24e-04 4.01e-02h  1
  11  1.6163481e+06 2.44e+01 4.43e+01  -1.0 1.20e+02    -  3.56e-02 4.30e-02h  1
  12  1.6941949e+06 2.24e+01 7.14e+01  -1.0 6.65e+01    -  4.47e-02 8.32e-02h  1
  13  1.7918558e+06 1.95e+01 1.70e+02  -1.0 4.69e+01    -  1.03e-02 1.29e-01h  1
  14  1.8063216e+06 1.91e+01 1.56e+02  -1.0 5.27e+01    -  4.05e-02 2.15e-02h  1
  15  1.8633366e+06 1.74e+01 1.71e+02  -1.0 4.39e+01    -  3.63e-02 8.68e-02h  1
  16  1.8875877e+06 1.67e+01 1.62e+02  -1.0 4.06e+01    -  4.69e-02 3.93e-02h  1
  17  1.9232737e+06 1.58e+01 1.58e+02  -1.0 3.82e+01    -  3.47e-02 5.92e-02h  1
  18  1.9418463e+06 1.53e+01 1.54e+02  -1.0 2.78e+01    -  2.49e-02 3.07e-02h  1
  19  1.9760000e+06 1.44e+01 1.44e+02  -1.0 4.16e+01    -  5.38e-02 5.62e-02h  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  20  2.0209287e+06 1.33e+01 1.27e+02  -1.0 3.72e+01    -  3.11e-02 7.75e-02h  1
  21  2.0815969e+06 1.18e+01 1.02e+02  -1.0 2.66e+01    -  3.66e-02 1.13e-01h  1
  22  2.1156165e+06 1.10e+01 9.55e+01  -1.0 3.10e+01    -  8.38e-02 6.99e-02h  1
  23  2.1559266e+06 1.00e+01 8.75e+01  -1.0 2.58e+01    -  1.06e-01 8.81e-02h  1
  24  2.2131218e+06 8.66e+00 6.19e+01  -1.0 2.40e+01    -  7.26e-02 1.34e-01h  1
  25  2.2730030e+06 7.30e+00 4.26e+01  -1.0 1.61e+01    -  1.23e-01 1.57e-01h  1
  26  2.3244782e+06 6.15e+00 3.34e+01  -1.0 1.45e+01    -  1.69e-01 1.58e-01h  1
  27  2.3291338e+06 6.04e+00 1.26e+02  -1.0 1.50e+01    -  1.61e-01 1.68e-02h  1
  28  2.4005626e+06 4.46e+00 6.40e+02  -1.0 1.93e+01    -  1.04e-02 2.62e-01h  1
  29  2.4426920e+06 3.55e+00 5.49e+02  -1.0 1.03e+01    -  1.30e-01 2.03e-01h  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  30  2.4727769e+06 2.91e+00 4.47e+02  -1.0 9.88e+00    -  2.66e-01 1.80e-01h  1
  31  2.5080868e+06 2.17e+00 3.21e+02  -1.0 1.52e+01    -  6.39e-01 2.56e-01h  1
  32  2.5491044e+06 1.30e+00 1.94e+02  -1.0 1.47e+01    -  4.40e-01 3.97e-01h  1
  33  2.5757333e+06 7.54e-01 1.09e+02  -1.0 1.33e+01    -  3.52e-01 4.22e-01h  1
  34  2.6074006e+06 1.11e-01 2.57e+01  -1.0 9.48e+00    -  6.45e-01 8.51e-01h  1
  35  2.6130581e+06 7.60e-04 1.52e+02  -1.0 2.35e+00    -  5.77e-01 1.00e+00h  1
  36  2.6130927e+06 1.58e-05 2.59e+00  -1.0 6.43e-02    -  1.00e+00 1.00e+00h  1
  37  2.6130989e+06 7.91e-07 1.14e-02  -1.0 2.51e-02    -  1.00e+00 1.00e+00h  1
  38  2.6082439e+06 1.88e-03 6.31e+00  -2.5 3.45e+00    -  4.17e-01 5.65e-01f  1
  39  2.6060212e+06 2.09e-03 5.10e+00  -2.5 2.93e+00    -  3.44e-01 4.15e-01f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  40  2.6040791e+06 2.37e-03 3.53e+00  -2.5 2.69e+00    -  6.29e-01 4.89e-01f  1
  41  2.6034516e+06 2.03e-03 2.24e+01  -2.5 1.73e+00    -  8.37e-01 1.95e-01f  1
  42  2.6027684e+06 1.64e-03 1.66e+01  -2.5 1.56e+00    -  2.19e-01 2.44e-01f  1
  43  2.6020541e+06 2.58e-03 6.91e+01  -2.5 1.22e+00    -  7.79e-02 3.73e-01f  1
  44  2.6016382e+06 2.25e-03 4.47e+01  -2.5 5.80e-01    -  3.03e-01 3.80e-01f  1
  45  2.6014273e+06 1.51e-03 2.64e+01  -2.5 4.52e-01    -  9.57e-01 3.89e-01h  1
  46  2.6012506e+06 5.53e-04 8.50e+00  -2.5 3.77e-01    -  1.00e+00 6.33e-01h  1
  47  2.6011738e+06 8.96e-05 2.83e-03  -2.5 1.34e-01    -  1.00e+00 1.00e+00h  1
  48  2.6010775e+06 1.58e-04 1.29e+01  -3.8 5.50e-01    -  7.34e-01 2.71e-01f  1
  49  2.6009248e+06 4.91e-04 5.94e+00  -3.8 5.22e-01    -  6.47e-01 6.06e-01f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  50  2.6008801e+06 3.75e-04 1.26e+01  -3.8 3.33e-01    -  9.94e-01 5.84e-01h  1
  51  2.6008685e+06 2.38e-04 1.19e+01  -3.8 3.05e-01    -  1.00e+00 4.53e-01h  1
  52  2.6008559e+06 4.51e-05 1.36e-03  -3.8 1.90e-01    -  1.00e+00 1.00e+00h  1
  53  2.6008564e+06 2.64e-08 2.43e-05  -3.8 3.13e-02    -  1.00e+00 1.00e+00h  1
  54  2.6008467e+06 2.21e-05 2.23e+00  -5.7 2.36e-01    -  7.93e-01 6.95e-01f  1
  55  2.6008444e+06 1.06e-05 3.58e+00  -5.7 2.08e-01    -  8.19e-01 5.84e-01h  1
  56  2.6008429e+06 2.44e-06 7.08e-05  -5.7 1.28e-01    -  1.00e+00 1.00e+00h  1
  57  2.6008429e+06 4.68e-08 1.43e-06  -5.7 1.70e-02    -  1.00e+00 1.00e+00h  1
  58  2.6008429e+06 3.34e-09 1.36e-07  -5.7 4.20e-03    -  1.00e+00 1.00e+00h  1
  59  2.6008427e+06 2.74e-08 5.85e-02  -8.6 1.11e-02    -  9.21e-01 9.68e-01h  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  60  2.6008427e+06 4.52e-09 1.81e-07  -8.6 5.24e-03    -  1.00e+00 1.00e+00h  1
  61  2.6008427e+06 1.07e-09 4.08e-08  -8.6 1.48e-03    -  1.00e+00 1.00e+00h  1
  62  2.6008427e+06 1.08e-10 4.10e-09  -8.6 4.48e-04    -  1.00e+00 1.00e+00h  1

Number of Iterations....: 62

                                   (scaled)                 (unscaled)
Objective...............:   1.5940443252951929e+04    2.6008427211516369e+06
Dual infeasibility......:   4.0978685143188479e-09    6.6860822679626320e-07
Constraint violation....:   9.0254859523157263e-12    1.0773987257906015e-10
Variable bound violation:   4.9107871546993920e-08    4.9107871546993920e-08
Complementarity.........:   4.9205211297476997e-09    8.0283222752963475e-07
Overall NLP error.......:   4.9205211297476997e-09    8.0283222752963475e-07

Number of objective function evaluations             = 63
Number of objective gradient evaluations             = 63
Number of equality constraint evaluations            = 63
Number of inequality constraint evaluations          = 63
Number of equality constraint Jacobian evaluations   = 63
Number of inequality constraint Jacobian evaluations = 63
Number of Lagrangian Hessian evaluations             = 62
Total seconds in IPOPT (w/o function evaluations)    =     20.665
Total seconds in NLP function evaluations            =      5.993

Timing Statistics:

OverallAlgorithm....................:     22.578 (sys:      4.049 wall:     26.658)
 PrintProblemStatistics.............:      0.000 (sys:      0.000 wall:      0.000)
 InitializeIterates.................:      0.436 (sys:      0.080 wall:      0.531)
 UpdateHessian......................:      3.597 (sys:      0.067 wall:      3.666)
 OutputIteration....................:      0.006 (sys:      0.000 wall:      0.006)
 UpdateBarrierParameter.............:      0.013 (sys:      0.000 wall:      0.013)
 ComputeSearchDirection.............:     16.006 (sys:      3.895 wall:     19.915)
 ComputeAcceptableTrialPoint........:      1.982 (sys:      0.005 wall:      1.987)
 AcceptTrialPoint...................:      0.015 (sys:      0.001 wall:      0.016)
 CheckConvergence...................:      0.522 (sys:      0.001 wall:      0.522)
PDSystemSolverTotal.................:     15.988 (sys:      3.894 wall:     19.896)
 PDSystemSolverSolveOnce............:     15.708 (sys:      3.880 wall:     19.602)
 ComputeResiduals...................:      0.239 (sys:      0.013 wall:      0.252)
 StdAugSystemSolverMultiSolve.......:     15.985 (sys:      3.957 wall:     19.971)
 LinearSystemScaling................:      0.000 (sys:      0.000 wall:      0.000)
 LinearSystemSymbolicFactorization..:      0.127 (sys:      0.021 wall:      0.162)
 LinearSystemFactorization..........:      0.000 (sys:      0.000 wall:      0.000)
 LinearSystemBackSolve..............:      4.424 (sys:      0.993 wall:      5.417)
 LinearSystemStructureConverter.....:      0.000 (sys:      0.000 wall:      0.000)
  LinearSystemStructureConverterInit:      0.000 (sys:      0.000 wall:      0.000)
QualityFunctionSearch...............:      0.000 (sys:      0.000 wall:      0.000)
TryCorrector........................:      0.000 (sys:      0.000 wall:      0.000)
Task1...............................:      0.000 (sys:      0.000 wall:      0.000)
Task2...............................:      0.000 (sys:      0.000 wall:      0.000)
Task3...............................:      0.000 (sys:      0.000 wall:      0.000)
Task4...............................:      0.000 (sys:      0.000 wall:      0.000)
Task5...............................:      0.000 (sys:      0.000 wall:      0.000)
Task6...............................:      0.000 (sys:      0.000 wall:      0.000)
Function Evaluations................:      5.923 (sys:      0.068 wall:      5.993)
 Objective function.................:      0.001 (sys:      0.000 wall:      0.001)
 Objective function gradient........:      0.003 (sys:      0.000 wall:      0.003)
 Equality constraints...............:      1.845 (sys:      0.001 wall:      1.846)
 Inequality constraints.............:      0.031 (sys:      0.000 wall:      0.031)
 Equality constraint Jacobian.......:      0.016 (sys:      0.000 wall:      0.016)
 Inequality constraint Jacobian.....:      0.439 (sys:      0.000 wall:      0.439)
 Lagrangian Hessian.................:      3.588 (sys:      0.067 wall:      3.658)

As you see the time is spent in the factorization and the backsolve steps of the optimizer. This is MUMPS LDL^T factorization phase and MUMPS backsubstitution phase. The symbolic phase runtime performance has not change. But the factorization and backsubstitution phases have and therefore the problem may be related to the BLAS library MUMPS is compiled with. Is it possible that MUMPS is shipped with Vanilla BLAS statically compiled in? By the way I am running on WSL Linux Ubuntu 22.04 but I experience the same problem on the MAC M1 Pro of a colleague. In his case using Ipopt v1.6.0 the time needed was 7.6 seconds and with Ipopt v1.5.1 the time was 3.4 seconds.

odow commented 5 months ago

Also, what is the output of versioninfo() and ] st -m

Can you provide this? How did you install Julia?

Since the time is dominated by the linear system solve, there's not much to fix on our end. The traces are not identical, so it isn't surprising that the runtimes might differ. This could just be luck, or some small change in numerical handling in MUMPS, or really anything.

You could try a different linear solver:

https://github.com/jump-dev/Ipopt.jl?tab=readme-ov-file#linear-solvers

Or swap the BLAS library:

https://github.com/jump-dev/Ipopt.jl?tab=readme-ov-file#blas-and-lapack

cc @amontoison

DoctorDro commented 5 months ago

It cannot be due to MUMPS 5.6.2 because I have compiled it myself and I am using it on another application and there is no problem there. It is linked against Intel's OneAPI MKL and I have the same running times with all MUMPS versions on my application C++ based. I do not know what the Julia MUMPS or Ipopt package are doing with the BLAS. You do not need to check the ] st -m, since what happens takes place at a fresh installation of Julia. Download Julia, unpack it in some folder and then type

] add Ipopt,PowerModels,PGLib

thats it.

odow commented 5 months ago

I can reproduce, but I don't know what we can or should do about it. @amontoison is the expert.

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores

Script

import PowerModels, Ipopt, PGLib
PowerModels.solve_ac_opf(
    PGLib.pglib(PGLib.find_pglib_case("3012")[1]),
    Ipopt.MOI.OptimizerWithAttributes(Ipopt.Optimizer, "print_timing_statistics" => "yes"),
)

Manifest diff

1.6.0

(pglib) pkg> st -m
Status `/private/tmp/pglib/Manifest.toml`
  [b6b21f68] Ipopt v1.6.0
  [9cc047cb] Ipopt_jll v300.1400.1303+0
  [d7ed1dd3] MUMPS_seq_jll v500.600.200+0
⌅ [319450e9] SPRAL_jll v2023.11.15+0

1.5.1

(pglib2) pkg> st -m
Status `/private/tmp/pglib2/Manifest.toml`
⌃ [b6b21f68] Ipopt v1.5.1
⌅ [9cc047cb] Ipopt_jll v300.1400.1302+0
⌅ [d7ed1dd3] MUMPS_seq_jll v500.600.100+0
⌅ [319450e9] SPRAL_jll v2023.8.2+0

There are no obvious culprits to Ipopt.jl:

https://github.com/jump-dev/Ipopt.jl/compare/v1.5.1...v1.6.0

Ipopt_jll is built slightly differently:

https://github.com/JuliaPackaging/Yggdrasil/pull/7663

The --enable-static flag is dropped.

Main difference is the new MUMPS version:

https://github.com/JuliaPackaging/Yggdrasil/pull/7559/files

There are a few minor changes to coin-or/Ipopt, but nothing obvious:

https://github.com/coin-or/Ipopt/compare/1e0a5df94e3b71ce6bde972352f1b2f72f342af7...4262e538feb1909036cecf9d8720454896cdc23c

Timing

Main culprit is StdAugSystemSolverMultiSolve

1.6.0

OverallAlgorithm....................:     16.470 (sys:      1.486 wall:     15.637)
 PrintProblemStatistics.............:      0.000 (sys:      0.000 wall:      0.000)
 InitializeIterates.................:      0.306 (sys:      0.030 wall:      0.303)
 UpdateHessian......................:      3.001 (sys:      0.088 wall:      3.102)
 OutputIteration....................:      0.001 (sys:      0.001 wall:      0.002)
 UpdateBarrierParameter.............:      0.009 (sys:      0.001 wall:      0.009)
 ComputeSearchDirection.............:     10.977 (sys:      1.318 wall:      9.987)
 ComputeAcceptableTrialPoint........:      1.718 (sys:      0.011 wall:      1.737)
 AcceptTrialPoint...................:      0.016 (sys:      0.010 wall:      0.026)
 CheckConvergence...................:      0.441 (sys:      0.026 wall:      0.469)
PDSystemSolverTotal.................:     10.963 (sys:      1.310 wall:      9.965)
 PDSystemSolverSolveOnce............:     10.793 (sys:      1.279 wall:      9.764)
 ComputeResiduals...................:      0.141 (sys:      0.005 wall:      0.146)
 StdAugSystemSolverMultiSolve.......:     10.981 (sys:      1.279 wall:      9.918)
 LinearSystemScaling................:      0.000 (sys:      0.000 wall:      0.000)
 LinearSystemSymbolicFactorization..:      0.092 (sys:      0.005 wall:      0.098)
 LinearSystemFactorization..........:      0.000 (sys:      0.000 wall:      0.000)
 LinearSystemBackSolve..............:      2.522 (sys:      0.042 wall:      2.532)
 LinearSystemStructureConverter.....:      0.000 (sys:      0.000 wall:      0.000)
  LinearSystemStructureConverterInit:      0.000 (sys:      0.000 wall:      0.000)
QualityFunctionSearch...............:      0.000 (sys:      0.000 wall:      0.000)
TryCorrector........................:      0.000 (sys:      0.000 wall:      0.000)
Task1...............................:      0.000 (sys:      0.000 wall:      0.000)
Task2...............................:      0.000 (sys:      0.000 wall:      0.000)
Task3...............................:      0.000 (sys:      0.000 wall:      0.000)
Task4...............................:      0.000 (sys:      0.000 wall:      0.000)
Task5...............................:      0.000 (sys:      0.000 wall:      0.000)
Task6...............................:      0.000 (sys:      0.000 wall:      0.000)
Function Evaluations................:      5.038 (sys:      0.115 wall:      5.175)
 Objective function.................:      0.001 (sys:      0.000 wall:      0.001)
 Objective function gradient........:      0.004 (sys:      0.003 wall:      0.006)
 Equality constraints...............:      1.619 (sys:      0.006 wall:      1.633)
 Inequality constraints.............:      0.027 (sys:      0.000 wall:      0.028)
 Equality constraint Jacobian.......:      0.388 (sys:      0.016 wall:      0.406)
 Inequality constraint Jacobian.....:      0.004 (sys:      0.003 wall:      0.007)
 Lagrangian Hessian.................:      2.996 (sys:      0.086 wall:      3.094)

1.5.1

OverallAlgorithm....................:     10.555 (sys:      0.785 wall:     11.533)
 PrintProblemStatistics.............:      0.000 (sys:      0.000 wall:      0.000)
 InitializeIterates.................:      0.249 (sys:      0.021 wall:      0.275)
 UpdateHessian......................:      2.946 (sys:      0.106 wall:      3.090)
 OutputIteration....................:      0.001 (sys:      0.001 wall:      0.002)
 UpdateBarrierParameter.............:      0.009 (sys:      0.001 wall:      0.010)
 ComputeSearchDirection.............:      5.164 (sys:      0.600 wall:      5.879)
 ComputeAcceptableTrialPoint........:      1.713 (sys:      0.018 wall:      1.757)
 AcceptTrialPoint...................:      0.015 (sys:      0.010 wall:      0.025)
 CheckConvergence...................:      0.456 (sys:      0.028 wall:      0.494)
PDSystemSolverTotal.................:      5.150 (sys:      0.592 wall:      5.856)
 PDSystemSolverSolveOnce............:      4.977 (sys:      0.559 wall:      5.646)
 ComputeResiduals...................:      0.144 (sys:      0.006 wall:      0.152)
 StdAugSystemSolverMultiSolve.......:      5.104 (sys:      0.550 wall:      5.766)
 LinearSystemScaling................:      0.000 (sys:      0.000 wall:      0.000)
 LinearSystemSymbolicFactorization..:      0.107 (sys:      0.006 wall:      0.116)
 LinearSystemFactorization..........:      0.000 (sys:      0.000 wall:      0.000)
 LinearSystemBackSolve..............:      1.569 (sys:      0.025 wall:      1.630)
 LinearSystemStructureConverter.....:      0.000 (sys:      0.000 wall:      0.000)
  LinearSystemStructureConverterInit:      0.000 (sys:      0.000 wall:      0.000)
QualityFunctionSearch...............:      0.000 (sys:      0.000 wall:      0.000)
TryCorrector........................:      0.000 (sys:      0.000 wall:      0.000)
Task1...............................:      0.000 (sys:      0.000 wall:      0.000)
Task2...............................:      0.000 (sys:      0.000 wall:      0.000)
Task3...............................:      0.000 (sys:      0.000 wall:      0.000)
Task4...............................:      0.000 (sys:      0.000 wall:      0.000)
Task5...............................:      0.000 (sys:      0.000 wall:      0.000)
Task6...............................:      0.000 (sys:      0.000 wall:      0.000)
Function Evaluations................:      4.996 (sys:      0.141 wall:      5.209)
 Objective function.................:      0.001 (sys:      0.000 wall:      0.001)
 Objective function gradient........:      0.004 (sys:      0.002 wall:      0.006)
 Equality constraints...............:      1.614 (sys:      0.013 wall:      1.653)
 Inequality constraints.............:      0.028 (sys:      0.000 wall:      0.028)
 Equality constraint Jacobian.......:      0.404 (sys:      0.018 wall:      0.431)
 Inequality constraint Jacobian.....:      0.004 (sys:      0.003 wall:      0.007)
 Lagrangian Hessian.................:      2.941 (sys:      0.103 wall:      3.082)

SPRAL

I cannot reproduce with SPRAL, so this is likely a MUMPS issue.

import PowerModels, Ipopt, PGLib
PowerModels.solve_ac_opf(
    PGLib.pglib(PGLib.find_pglib_case("3012")[1]),
    Ipopt.MOI.OptimizerWithAttributes(Ipopt.Optimizer, "print_timing_statistics" => "yes", "linear_solver" => "spral"),
)

Julia 1.9

Can also reproduce.

amontoison commented 5 months ago

Can you provide the timing if you switch the BLAS / LAPACK backend with a using MKL or using AppleAccelerate before the benchmarks?

odow commented 5 months ago

More or less exactly the same timing:

julia> LinearAlgebra.BLAS.lbt_get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries: 
├ [ LP64] Accelerate
└ [ILP64] Accelerate

1.6.0

 StdAugSystemSolverMultiSolve.......:     10.598 (sys:      1.262 wall:      9.638)

1.5.1

 StdAugSystemSolverMultiSolve.......:      4.770 (sys:      0.539 wall:      5.338)

DoctorDro commented 5 months ago

I can verify that my application which is C++ based has the same running time performance with both MUMPS 5.6.1 and 5.6.2 versions. Just rechecked it. I am using OneAPI MKL BLAS on Linux. I can also verify that in Julia switching the BLAS to MKL does not change essentially anything. The slow down for MUMPS 5.6.2 is consistent and only for MUMPS. Both 1.6.0 and 1.5.1 versions of Ipopt with HSL have the same runtime performance. It is only MUMPS 5.6.2 that is problematic.

amontoison commented 5 months ago

I added some OpenMP flags when I compiled MUMPS 5.6.2. It could be the culprit. I will try to recompile locally MUMPS with BinaryBuilder.jl asap.

Update: I quickly checked with diff -r MUMPS_5.6.1/src/ MUMPS_5.6.2/src/ and I don't a culprit in the source code. The problem is related to the compilation.

DoctorDro commented 5 months ago

In all my tests OMP_NUM_THREADS=1 from the shell. Usually an OMP-parallel code slows down the higher the number of threads due to wrong parallel directives or design not appropriate for OMP (memory not copied for each thread etc). As far as I know MUMPS is MPI parallel and not OMP parallel. Correct me please if I am wrong. I am always linking against the sequential mpi library.

-L${MUMPSDIR}/libseq -L${MUMPSDIR}/lib -ldmumps -lmumps_common -lpord -lmpiseq   -lgomp -lgfortran

Furthermore, SPRAL is much slower than MUMPS and slower than HSL libraries. It relies on this hwlock library that nobody knows for sure what it offers, and even when it uses a powerful GPU it is still slower than MA57 or MA97 for large problems. It also requires particular environmental variables to be exported by the user and I cannot see where and how we could benefit from it when in academia we can use for research purposes HSL libraries that are more reliable and much faster without the GPU and ENV hassle.

amontoison commented 5 months ago

Do you compile MUMPS with -fopenmp? You need the flag -fopenmp at compilation and link time to enable OMP-code.

SPRAL was developed by Jonathan Hogg, but he left the HSL team 10 years ago. Nobody continued the development of SPRAL after. He was the only expert in C++ / CUDA kernels, everybody else is doing Fortran.

Yes, HSL is more efficient and easy to use. I'm quite happy that they accepted the collaboration for libHSL / HSL_jll.jl! :smiley:

DoctorDro commented 5 months ago

I did not know MUMPS was OMP capable. I was hoping it to be at some point, but because I noticed MPI and libmpiseq I was under the impression that it was only MPI parallel. It is good to know that it can OpenMP. I will give it a try.

I recompiled MUMPS 5.6.2 with -fopenmp and linked with -fopenmp. I set OMP_NUM_THREADS=6 and so the same running time as the sequential code. So I guess it is not OMP aware.

DoctorDro commented 5 months ago

Hi guys, I can confirm that the problem is the compilation of MUMPS. In my C++ application I compiled MUMPS adding -fopenmp at all Makefile flags. The solution of a matrix was running 2 times slower afterwards if OMP_NUM_THREADS = CPU_CORES. So there I guess there is something nasty in the OMP directives when running without MPI, just linked with the sequential MPI library.

amontoison commented 5 months ago

Thanks @DoctorDro! I recompile MUMPS with Yggdrasil now.

odow commented 5 months ago

Very weird. Thanks for digging deep into this @DoctorDro

DoctorDro commented 5 months ago

Just FYI, when I was developing my parallel OMP sparse linear solver, I would observe the same effect 2 times or more slower code, whenever I forgot to make sure that every OMP thread would access its own copy of the data. That is why one should make sure that he creates classes with all the appropriate slices of the data that each thread should process, so each thread has everything it needs to avoid the problems described above. Modern languages define the "task" environment for parallel code. I am not sure what is the status now in C++ or Fortran.

amontoison commented 5 months ago

@DoctorDro Do you see the same problem with the parallel version of MUMPS? When I compiled MUMPS 5.6.2, I added fopenmp for MUMPS_seq_jll and MUMPS_jll.

amontoison commented 5 months ago

@DoctorDro The release 1.6.1 of Ipopt.jl should fix your problem. I recompiled MUMPS_seq_jll without OpenMP. Then I upgraded Ipopt_jll for the version 3.14.14 with the updated dependencies (MUMPS / SPRAL).

jump-dev / Ipopt.jl