Nicholaswogan commented 3 years ago

Here are some micro julia benchmarks for Lorenz equations: https://github.com/Nicholaswogan/NumbaLSODA/tree/main/benchmark

With the best Julia methods, integrating to 100 seconds and interpolating takes a few milliseconds. I compared this to FLINT ( see attached), which is slower by a factor of at least 5.

Does this look right?

I had to adjust some of the FLINT compiler options in the CMakeLists.txt because I was getting errors. So maybe these adjustments are responsible for the speed difference.

I used gfortran 11 and compiled in Release mode.

FLINT-benchmark.zip

princemahajan commented 3 years ago

Interesting, I can reproduce it. Following are my results. Even without interpolation, I am seeing it's taking more time than Julia's numbers you published. Need to investigate this.

 --- FLINT Stat Results (         100  loops)  ---

 A. Natural Step-size
      Method   Time (ms)      FCalls    Accepted    Rejected
  DOP54        0.328E+01       63617       10575          30
  DOP853       0.172E+01       37349        2466         705
  Verner65E    0.344E+01       53682        6640          78
  Verner98R    0.281E+01       39628        2141         358
 B. Interpolated Grid
      Method   Time (ms)      FCalls    Accepted    Rejected
  DOP54        0.594E+01       63617       10575          30
  DOP853       0.344E+01       44747        2466         705
  Verner65E    0.688E+01       60322        6640          78
  Verner98R    0.578E+01       50333        2141         358

princemahajan commented 3 years ago

@Nicholaswogan So I tried to reproduce your Julia benchmark results without using the benchmarking function but with a simple loop to make sure the FLINT test script and your script are identical. And I am seeing much worse performance for Julia code this way and not entirely sure why. See your modified Julia script and results on my machine.

`using OrdinaryDiffEq, StaticArrays, BenchmarkTools, Sundials, LSODA, ModelingToolkit

function lorenz_static(u,p,t) @inbounds begin dx = p[1](u[2]-u[1]) dy = u[1](p[2]-u[3]) - u[2] dz = u[1]u[2] - p[3]u[3] end SA[dx,dy,dz] end u0 = SA[1.0,0.0,0.0] p = SA[10.0,28.0,8/3] tspan = (0.0,100.0)

prob = ODEProblem(lorenz_static,u0,tspan,p)

@btime solve(prob,Tsit5(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8) # 2.612 ms (30 allocations: 59.22 KiB)

@btime solve(prob,Vern8(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8); # 1.803 ms (31 allocations: 65.70 KiB)

rtime = @elapsed for i in 1:100 solve(prob,Tsit5(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8) # 2.612 ms (30 allocations: 59.22 KiB) end println("Tsit5: ", rtime/100*1000, " ms")

rtime = @elapsed for i in 1:100 solve(prob,Vern8(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8); # 1.803 ms (31 allocations: 65.70 KiB) end println("Vern8: ", rtime/100*1000, " ms")`

julia> include("TestLorenz.jl") Lorenz Tsit5: 15.803541000000001 ms Vern8: 38.870095 ms

julia> include("TestLorenz.jl") Lorenz Tsit5: 13.455046 ms Vern8: 38.820251999999996 ms

Nicholaswogan commented 3 years ago

@princemahajan I get the same times as you, when I run that exact julia script. But for some reason I get faster times when I uncomment the @btime:

using OrdinaryDiffEq, StaticArrays, BenchmarkTools, Sundials, LSODA, ModelingToolkit

function lorenz_static(u,p,t)
    @inbounds begin
        dx = p[1]*(u[2]-u[1])
        dy = u[1]*(p[2]-u[3]) - u[2]
        dz = u[1]*u[2] - p[3]*u[3]
    end
    SA[dx,dy,dz]
end
u0 = SA[1.0,0.0,0.0]
p = SA[10.0,28.0,8/3]
tspan = (0.0,100.0)

prob = ODEProblem(lorenz_static,u0,tspan,p)
@btime solve(prob,Tsit5(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8) # 2.612 ms (30 allocations: 59.22 KiB)
@btime solve(prob,Vern8(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8); # 1.803 ms (31 allocations: 65.70 KiB)

rtime = @elapsed for i in 1:100
solve(prob,Tsit5(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8) # 2.612 ms (30 allocations: 59.22 KiB)
end
println("Tsit5: ", rtime/100*1000, " ms")

rtime = @elapsed for i in 1:100
solve(prob,Vern8(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8); # 1.803 ms (31 allocations: 65.70 KiB)
end
println("Vern8: ", rtime/100*1000, " ms")

The results are

julia> include("TestLorenz.jl")
  2.446 ms (31 allocations: 59.23 KiB)
  1.399 ms (32 allocations: 65.72 KiB)
Tsit5: 2.60718273 ms
Vern8: 1.63626658 ms

julia> include("TestLorenz.jl")
  2.450 ms (31 allocations: 59.23 KiB)
  1.404 ms (32 allocations: 65.72 KiB)
Tsit5: 2.66984644 ms
Vern8: 1.7514147299999998 ms

princemahajan commented 3 years ago

yes i see that too. this is weird.

I also like to point out that generally in my benchmarking codes, I perturb the initial conditions slightly before every integration in the loop, this is closer to reality as you not gonna integrate the exact same initial conditions for the same time span again and again in practice. In julia diffeq, this will require the ODEProblem() invocation to be part of the timed loop. But in FLINT, you specify initial conditions directly in Integrate() method so this is more efficient this way.

princemahajan commented 3 years ago

I thought I found the issue but not. With @cpuelapsed macro that is closer to cpu_time() function I am using in FLINT for timing the code, the results are always same. FLINT's solvers do seem to be taking more time for this equation than Julia's TSIT5 and Verner8 solvers. If I can extract the number of function calls made and accepted/rejected steps from Julia solvers, then we will get an idea why is that.

julia> include("TestLorenz.jl")
Tsit5: 12.633589 ms
Vern8: 35.851274000000004 ms
Tsit5: 3.43 ms
Vern8: 1.72 ms

Nicholaswogan commented 3 years ago

You can get Julia ODE solver stats with

sol = solve(prob,Tsit5(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8);
sol.destats

result is

DiffEqBase.DEStats
Number of function 1 evaluations:                  70647
Number of function 2 evaluations:                  0
Number of W matrix evaluations:                    0
Number of linear solves:                           0
Number of Jacobians created:                       0
Number of nonlinear solver iterations:             0
Number of nonlinear solver convergence failures:   0
Number of rootfind condition calls:                0
Number of accepted steps:                          11774
Number of rejected steps:                          0

princemahajan commented 3 years ago

Ok thanks, I had seen this but has forgotten. It’s amazing that Tsit5() it is not rejecting any steps. Its function calls are more than FLINT but still taking less time.

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

From: Nick Wogan @.> Sent: Monday, September 6, 2021 4:05:33 PM To: princemahajan/FLINT @.> Cc: Bharat Mahajan @.>; Mention @.> Subject: Re: [princemahajan/FLINT] Slower than Julia for Lorenz system? (#3)

You can get Julia ODE solver stats with

sol = solve(prob,Tsit5(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8); sol.destats

result is

DiffEqBase.DEStats Number of function 1 evaluations: 70647 Number of function 2 evaluations: 0 Number of W matrix evaluations: 0 Number of linear solves: 0 Number of Jacobians created: 0 Number of nonlinear solver iterations: 0 Number of nonlinear solver convergence failures: 0 Number of rootfind condition calls: 0 Number of accepted steps: 11774 Number of rejected steps: 0

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/princemahajan/FLINT/issues/3#issuecomment-913852713, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMK5YH6G2SACVIKXY2CHSMDUAUUJ3ANCNFSM5DPGSKAQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

princemahajan commented 3 years ago

Following are the test results from the latest FLINT code and Julia Diffeq package. In these results, Julia's Tsit5 is slower, DP5 and DP8 take almost same time, and Vern6 and Vern9 are faster than FLINT's corresponding solvers.

 --- FLINT Stats ---

 A. Internal Step-size
      Method     Time(s)      FCalls    Accepted    Rejected Total Steps
  DOP54        0.127E+01      253225       42078         151       42229
  DOP853       0.297E+00       74888        5839         438        6277
  Verner65E    0.105E+01      177861       22123         125       22248
  Verner98R    0.453E+00       73507        4385         223        4608
 B. Interpolated Grid
      Method     Time(s)      FCalls    Accepted    Rejected Total Steps
  DOP54        0.230E+01      253225       42078         151       42229
  DOP853       0.672E+00       92405        5839         438        6277
  Verner65E    0.266E+01      199984       22123         125       22248
  Verner98R    0.123E+01       95432        4385         223        4608

julia> sol = main();
Tsit5: 1.4893084 s
DP5: 1.2646739 s
DP8: 0.2551725 s
Vern6: 0.8604503 s
Vern9: 0.2845905 s

============ julia script:

##

using OrdinaryDiffEq, StaticArrays, BenchmarkTools, Sundials, LSODA, ModelingToolkit

function lorenz_static(u,p,t)
    @inbounds begin
        dx = p[1]*(u[2]-u[1])
        dy = u[1]*(p[2]-u[3]) - u[2]
        dz = u[1]*u[2] - p[3]*u[3]
    end
    SA[dx,dy,dz]
end

##

function main()

u0 = SA[1.0,0.0,0.0]
p = SA[10.0,28.0,8/3]
tspan = (0.0,100.0)

prob = ODEProblem(lorenz_static,u0,tspan,p)
#@btime solve(prob,Tsit5(),saveat=0.1,reltol=1.0e-11,abstol=1.0e-14) # 2.612 ms (30 allocations: 59.22 KiB)
#@btime solve(prob,Vern9(),saveat=0.1,reltol=1.0e-11,abstol=1.0e-14); # 1.803 ms (31 allocations: 65.70 KiB)
sol = 0

rtime = @elapsed for i in 1:100
    sol = solve(prob,Tsit5(),reltol=1.0e-11,abstol=1.0e-14, dense=false) # 2.612 ms (30 allocations: 59.22 KiB)
end
println("Tsit5: ", rtime, " s")

rtime = @elapsed for i in 1:100
    sol = solve(prob,DP5(),reltol=1.0e-11,abstol=1.0e-14,dense=false); # 1.803 ms (31 allocations: 65.70 KiB)
end
println("DP5: ", rtime, " s")

rtime = @elapsed for i in 1:100
    sol = solve(prob,DP8(),reltol=1.0e-11,abstol=1.0e-14,dense=false); # 1.803 ms (31 allocations: 65.70 KiB)
end
println("DP8: ", rtime, " s")

rtime = @elapsed for i in 1:100
    sol = solve(prob,Vern6(),reltol=1.0e-11,abstol=1.0e-14,dense=false); # 1.803 ms (31 allocations: 65.70 KiB)
end
println("Vern6: ", rtime, " s")

rtime = @elapsed for i in 1:100
    sol = solve(prob,Vern9(),reltol=1.0e-11,abstol=1.0e-14,dense=false); # 1.803 ms (31 allocations: 65.70 KiB)
end
println("Vern9: ", rtime, " s")

return sol

end

princemahajan / FLINT

Slower than Julia for Lorenz system? #3

@btime solve(prob,Tsit5(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8) # 2.612 ms (30 allocations: 59.22 KiB)

@btime solve(prob,Vern8(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8); # 1.803 ms (31 allocations: 65.70 KiB)