Benchmarks - Githubissues

ChrisRackauckas commented 6 years ago

Hey @rveltz @sdwfrost

I wanted to let you know that this library got to a good enough point that I could run it through the suite of benchmarks. You can find the results uploaded at https://github.com/JuliaDiffEq/DiffEqBenchmarks.jl

A quick summary of how lsoda performs. It does not do very well on non-stiff problems and in that case Runge-Kutta methods are consistently performing better. However, at high tolerance for stiff problems it performs very well, among the best and usually more efficient than Sundials' CVODE_BDF for small problems (for large problems, or with an expensive f, CVODE_BDF is the most efficient). When tolerances get lower lsoda tends to have weird behavior and its time diverges to infinity though.

The sweet spot on stiff problems seems to be about:

abstols = 1./10.^(5:8)
reltols = 1./10.^(1:4);

where in it lsoda does quite well.

Just thought I'd share it with you. @sdwfrost mentioned that he has some problems where it does extremely well, so I was wondering if you'd be willing to share that problem to add it to the set of benchmarks. Looking at these, it seems stiffness detection and switching is not worthwhile when problems are non-stiff, but very much worthwhile when they are stiff!

Because of these benchmarks, I will be taking the time to add the integrator interface and callbacks to lsoda. I wouldn't expect it to do that well if there's lots of events (since it's a multistep method of course), but for low numbers of events it's likely quite good. I hope LSODA.jl can join the standard DiffEq grouping sometime in the future, though I understand if it cannot (though I will give you JuliaDiffEq membership and allow you to retain admin status of the repo. It would mostly be for maintenance and QC reasons).

sdwfrost commented 6 years ago

Dear @ChrisRackauckas

Cool! Nice to see a systematic comparison. Crazy teaching week this week, but next week during my protected time at the ATI I'll work on formatting some of my problems into a PR for https://github.com/JuliaDiffEq/DiffEqBenchmarks.jl.

ChrisRackauckas commented 6 years ago

Don't worry about making the full notebook. As long as there's some reasonably well-contained way to create the ODEProblem (though it can be quite involved like in the Filament problem) then I can take it from there. At this point I have quite a system for putting it together so I can churn that part out pretty easily, and these are all run on the same computer (I should probably versioninfo() at the bottom, but it's a 12 core Xeon so it should be pretty much between PC and cluster performance).

rveltz commented 6 years ago

Excellent review, thank you @ChrisRackauckas. Also, I mention issue 14 for which I have no clue. This would enable to save from declaring a context at every call to lsoda as can be done in the C version.

ChrisRackauckas commented 6 years ago

Oh, we've been doing that in the common interface for awhile. It's just about setting itask just like in Sundials' CV_NORMAL vs CV_ONE_STEP. Of course, if you modify the context data at all you have to reset it because it's a multistep method so it has to know to "go back to Euler", and yes that is "expensive" in terms of it'll take more steps but it doesn't (shouldn't? I haven't checked for LSODA yet, but know for Sundials) require any allocations.

rveltz commented 6 years ago

I know that in the Clib of lsoda you can keep the context and make one step further keeping the same context, this is int lsoda(struct lsoda_context_t * ctx, double *y, double *t, double tout); which is called repetedly here

ChrisRackauckas commented 6 years ago

Yes, look at the common interface.

ChrisRackauckas commented 5 years ago

@JuliaRegistrator register()

JuliaRegistrator commented 5 years ago

Registration pull request created: JuliaRegistries/General/3358

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if Julia TagBot is installed, or can be done manually through the github interface, or via:

git tag -a v0.6.0 -m "<description of version>" 5ea35ea80844b9a2c40bac3612cb8d9ccc5bb432
git push origin v0.6.0

ChrisRackauckas commented 5 years ago

Hey, I have some nice updates on this topic. The wrappers seem pretty optimal now. I've made sure as much inlines as possible and the allocations are all gone. GC handling is all fixed even with the pointer handling (hence the GC preserve statements). So I think LSODA.jl from the common interface is at its "good place" for the benchmarks to be precise. The following is a summary of the following benchmarks:

Non-stiff: http://benchmarks.juliadiffeq.org/html/NonStiffODE/linear_wpd.html http://benchmarks.juliadiffeq.org/html/NonStiffODE/ThreeBody_wpd.html http://benchmarks.juliadiffeq.org/html/NonStiffODE/Pleiades_wpd.html http://benchmarks.juliadiffeq.org/html/NonStiffODE/RigidBody_wpd.html http://benchmarks.juliadiffeq.org/html/NonStiffODE/FitzhughNagumo_wpd.html http://benchmarks.juliadiffeq.org/html/NonStiffODE/LotkaVolterra_wpd.html

http://benchmarks.juliadiffeq.org/html/StiffODE/VanDerPol.html http://benchmarks.juliadiffeq.org/html/StiffODE/ROBER.html http://benchmarks.juliadiffeq.org/html/StiffODE/Orego.html http://benchmarks.juliadiffeq.org/html/StiffODE/Hires.html http://benchmarks.juliadiffeq.org/html/StiffODE/Pollution.html

We thoroughly destroy LSODA in the non-stiff benchmarks, so there's not much to comment on there. The interesting ones are the stiff benchmarks. If the problem is small (so not Pollution i.e. <20 ODEs or so) and stiff, and you high tolerances (reltol >= 1e-3), then LSODA does very well. It's currently the best in that category if you are measuring only the error of the final point. If you're using the timeseries error, then sometimes Rosenbrock23 is able to match it in this domain. It hits a cliff the moment you want more than 3 digits of accuracy, and I think the OREGO results show it well:

Capture

but, that does mean that many of the small stiff example problems people use, with default tolerances, does best in LSODA. It's the only time it does the best, but it's a very important case 😄.

We have narrowed down that a lot of this behavior may be because of how OpenBLAS is handling small matrix factorizations. One of the big improvements on the DiffEq side in these small stiff benchmarks required stating:

using LinearAlgebra
LinearAlgebra.BLAS.set_num_threads(1)

since allowing OpenBLAS to use threads (the default) when factorizations are small is really really awful for performance. I wonder if we can skip OpenBLAS for small matrices and have a pure Julia basecase (since DiffEq is now using https://github.com/YingboMa/RecursiveFactorization.jl, we have a lot more control). @YingboMa let's look into the choice of https://github.com/YingboMa/RecursiveFactorization.jl/blob/master/src/lu.jl#L20 .

Anyways, both this BLAS thing and algorithmically (it doesn't scale because of its factorization choices anyways, but that's a bigger issue) are the reasons why LSODA doesn't seem to scale so well, but this BLAS thing interacting with the algorithm makes LSODA.jl perfect for these small cases and so we'll work to overcome this case 😅.

rveltz commented 5 years ago

That's a great summary! Thank you!

I am looking forward to seeing an algo beating LSODA. I can only benefit from this :D

rveltz commented 5 years ago

"I have some nice updates on this topic. The wrappers seem pretty optimal now. I've made sure as much inlines as possible and the allocations are all gone. GC handling is all fixed even with the pointer handling (hence the GC preserve statements). So I think LSODA.jl from the common interface is at its "good place" for the benchmarks to be precise."

Thank you a lot for this.

rveltz / LSODA.jl

Benchmarks #45