jump-dev / Ipopt.jl

A Julia interface to the Ipopt nonlinear solver
https://github.com/coin-or/ipopt
Other
149 stars 58 forks source link

Using Ipopt in parallel threads #375

Closed inversed-ru closed 11 months ago

inversed-ru commented 1 year ago

Trying to run Ipopt in multiple parallel threads results in effectively serial execution.

using JuMP, Ipopt
replicas = Threads.nthreads()
println(replicas)

# Set up an array of models
solver = optimizer_with_attributes(Ipopt.Optimizer)
models = [read_from_file(raw"QPLib\\lp\\QPLIB_2353.lp") for _ in 1 : replicas]
for model in models
    set_optimizer(model, solver)
    set_silent(model)
    relax_integrality(model)
end

# Optimize
@time optimize!(models[1])
@time begin
    Threads.@threads for i in 1 : replicas
        optimize!(models[i])
    end
end

Output:

16
  1.073861 seconds (17.38 k allocations: 2.425 MiB)
 16.976941 seconds (284.01 k allocations: 37.921 MiB, 0.07% gc time, 1.39% compilation time)

My guess is that ccalls are blocking. Perhaps it would be possible to use @threadcall to make Ipopt work in multiple threads?

The issue has also been reported in this forum thread. I know I can use the Distributed module for parallelism, but I don't really need the true distributed functionality and it is much more difficult to use compared to threads.

odow commented 1 year ago

Okay, I took a deeper look at this, and it's a subtle issue that might not have a resolution.

It's not the outgoing ccalls that are blocking, but the incoming callbacks from C into Julia that Ipopt uses to evaluate functions and derivatives. Ipopt is particularly bad for threaded parallelism, because most of the time spent in the solver is actually in Julia, not in C. MILP solvers like HiGHS don't have this problem because they don't call back into Julia.

We can't use @threadcall because of the callback issue. (From the docstring Note that the called function should never call back into Julia.)

There's no good way around this in Ipopt, but you can use AmplNLWriter, which uses AMPL to compute derivatives instead of calling back into Julia:

using JuMP, Ipopt

function main(filename, optimizer, f)
    replicas = Threads.nthreads()
    models = [read_from_file(filename) for _ in 1:replicas]
    for model in models
        set_optimizer(model, optimizer)
        relax_integrality(model)
    end
    @time f(models)
end

function test_serial(models)
    for m in models
        optimize!(m)
    end
end

function test_threaded(models)
    Threads.@threads for m in models
        optimize!(m)
    end
end

optimizer = optimizer_with_attributes(Ipopt.Optimizer, MOI.Silent() => true)
@time main("/Users/Oscar/Downloads/QPLIB_2353.lp", optimizer, test_serial)
@time main("/Users/Oscar/Downloads/QPLIB_2353.lp", optimizer, test_threaded)

# Install with `] add AmplNLWriter@1.1 Ipopt_jll`
import AmplNLWriter, Ipopt_jll
optimizer = optimizer_with_attributes(
    () -> AmplNLWriter.Optimizer(Ipopt_jll.amplexe),
    "print_level" => 0, "sb" => "yes",
)
@time main("/Users/Oscar/Downloads/QPLIB_2353.lp", optimizer, test_serial)
@time main("/Users/Oscar/Downloads/QPLIB_2353.lp", optimizer, test_threaded)

Removing all the prints and running twice to ignore compilation, I get:

julia> @time main("/Users/Oscar/Downloads/QPLIB_2353.lp", optimizer, test_serial)
  5.263893 seconds (69.52 k allocations: 9.701 MiB)
  5.526651 seconds (701.39 k allocations: 253.097 MiB, 1.25% gc time)

julia> @time main("/Users/Oscar/Downloads/QPLIB_2353.lp", optimizer, test_threaded)
  5.145869 seconds (80.00 k allocations: 10.416 MiB, 1.31% compilation time)
  5.498476 seconds (753.88 k allocations: 256.707 MiB, 1.64% gc time, 2.42% compilation time)

julia> @time main("/Users/Oscar/Downloads/QPLIB_2353.lp", optimizer, test_serial)
  5.883419 seconds (1.06 M allocations: 53.135 MiB, 0.17% gc time)
  6.141101 seconds (1.69 M allocations: 296.599 MiB, 1.23% gc time, 0.06% compilation time)

julia> @time main("/Users/Oscar/Downloads/QPLIB_2353.lp", optimizer, test_threaded)
  2.550396 seconds (1.06 M allocations: 53.128 MiB, 0.39% gc time)
  2.849068 seconds (1.69 M allocations: 296.558 MiB, 3.57% gc time)

So AmplNLWriter is a fraction slower (expected, it has to write a file), but is faster then threaded.

inversed-ru commented 1 year ago

@odow Thank you very much for looking into this issue. I have tried the AmplNLWriter workaround, but could not make it work. It works with 4 threads, but when I use 16 threads, the execution hangs and I'm getting this message from multiple threads: Problem with integer stack size 1 1 14

For now I have implemented a workaround using the Distributed module. It is a bit inconvenient compared to @threads, but at least it achieves excellent CPU utilization. If there is no hope of making Ipopt threads-friendly, you can close this issue.

odow commented 1 year ago

Ah. The problem is that Ipopt_jll that works with AmplNLWriter is old and uses a version of MUMPS that isn't thread-safe.

I'd stick with Distributed for now.

odow commented 11 months ago

Closing because I don't think there is an available resolution here.