hinsley / PlantChaos

NeurDS lab Plant Chaos paper
0 stars 0 forks source link

Crashing During Traces #8

Closed dbloom2 closed 1 year ago

dbloom2 commented 1 year ago

Ran spindump Julia_Spindump.txt. No obvious infinite loops or deadlocks. Threads appear to be running and mainly attributed to ExponentialUtilities and OrdinaryDiffEq

hinsley commented 1 year ago

Running this on my home desktop (Windows 10) for about half an hour -- haven't seen any crashing yet. Will try running on my laptop later.

hinsley commented 1 year ago

Running this on my laptop, I do not get crashes. I had to alter the explorer code a small amount to get it to run in the first place, but afterwards I never get any hang-ups or crashes whatsoever. Closing.

hinsley commented 1 year ago

Nevermind. The instant I closed this issue, I got my first crash. Now the application is crashing within 10 seconds of starting it.

jamesjscully commented 1 year ago

At least in this case it looks like the trajectory becomes ustable, which causes the CPU to lock up for some reason.

image

image

jamesjscully commented 1 year ago

Actually, scratch that. I was able to get the bug to occur without the solution going unstable. However, I think the issue is that Vscode was only starting julia with 1 thread. Try starting with more threads by changing in settings.json. This works for me and is also way faster.

hinsley commented 1 year ago

Still crashing. I was running with 8 threads at top integration speed in the region of bursting chaos. Also can't be VS Code failing to init Julia with multiple threads -- otherwise it wouldn't be able to get speedups in parameter sweeps from multi-thread solves.

jamesjscully commented 1 year ago

Hmmm, can you look at your resource monitor and see it there is a single CPU that goes to 100? The diffeq package could set # of threads. I ran Threads.nthreads() and it said I only had 1.

I tried to make a MWE with the lorenz system, but I can't get it to fail.

jamesjscully commented 1 year ago

Ok, I cleaned up a few things. It resets step size when you change parameters, and I removed some unnecessary observables. I still got it to crash when the solution goes unstable, which is weird because I wrote an explicit exception for that. Can you see if it will crash on your computer? I can't get it to crash except for when it goes unstable now.

jamesjscully commented 1 year ago

It is crashing more often now that I added return maps. My thought is to try to @async as much as possible to help the scheduler out.

jamesjscully commented 1 year ago

Also minimizing the amount of interactions between observables by, for example, having a static global problem, then computing everything needed to update and using remake within a single function that is lifted on evaluation.

hinsley commented 1 year ago

I'd like to note that I haven't lately gotten the explorer to crash other than when tabbing out and back in to/from another program. Hopefully this is reliable behavior so we won't have issues at the poster session at SIAM.

hinsley commented 1 year ago

I changed the solver from BS3 to Tsit5 with no noticeable hits to performance on my laptop -- even accelerated solves (speed ~4x) work without hiccups. This seems to have resolved the crashing issue but, without changing solve tolerances, has introduced the issue of frequent instabilities coinciding with stopped trajectories/traces. However, the simulation can be restarted by clicking the reset button or choosing other initial conditions or parameters. Here is a console log from my most recent session:

┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596
┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596
┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596
┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN.
└ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249
┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596
┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN.
└ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249
┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596
┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596
┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN.
└ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249
┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596
┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN.
└ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249
┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596
┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596
┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN.
└ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249
┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596
┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN.
└ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249
┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596
┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596
┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN.
└ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249
┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596
┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596
┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN.
└ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249
┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596

I will try playing with the solution tolerances to reduce the frequency of stopped solves.

jamesjscully commented 1 year ago

What about RK4 or DP8?

On Fri, May 12, 2023, 12:56 PM Carter Hinsley @.***> wrote:

I changed the solver from BS3 to Tsit5 with no noticeable hits to performance on my laptop -- even accelerated solves (speed ~4x) work without hiccups. This seems to have resolved the crashing issue but, without changing solve tolerances, has introduced the issue of frequent instabilities coinciding with stopped trajectories/traces. However, the simulation can be restarted by clicking the reset button or choosing other initial conditions or parameters. Here is a console log from my most recent session:

┌ Warning: Instability detected. Aborting └ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596 ┌ Warning: Instability detected. Aborting └ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596 ┌ Warning: Instability detected. Aborting └ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596 ┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN. └ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249 ┌ Warning: Instability detected. Aborting └ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596 ┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN. └ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249 ┌ Warning: Instability detected. Aborting └ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596 ┌ Warning: Instability detected. Aborting └ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596 ┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN. └ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249 ┌ Warning: Instability detected. Aborting └ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596 ┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN. └ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249 ┌ Warning: Instability detected. Aborting └ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596 ┌ Warning: Instability detected. Aborting └ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596 ┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN. └ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249 ┌ Warning: Instability detected. Aborting └ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596 ┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN. └ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249 ┌ Warning: Instability detected. Aborting └ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596 ┌ Warning: Instability detected. Aborting └ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596 ┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN. └ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249 ┌ Warning: Instability detected. Aborting └ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596 ┌ Warning: Instability detected. Aborting └ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596 ┌ Warning: First function call produced NaNs. Exiting. Double check that none of the initial conditions, parameters, or timespan values are NaN. └ @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/gjQVg/src/initdt.jl:249 ┌ Warning: Instability detected. Aborting └ @ SciMLBase ~/.julia/packages/SciMLBase/VdcHg/src/integrator_interface.jl:596

I will try playing with the solution tolerances to reduce the frequency of stopped solves.

— Reply to this email directly, view it on GitHub https://github.com/hinsley/PlantChaos/issues/8#issuecomment-1546028539, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMGYJCKYZTUPDTPHV7VCU3XFZTVDANCNFSM6AAAAAAWO5JRRQ . You are receiving this because you modified the open/close state.Message ID: @.***>

hinsley commented 1 year ago

Tsit5 with high tolerances (abstol and reltol both at 1e-6) maintained good solve speed along with not experiencing instabilities so far for me in any region of parameter speed. I haven't had a single application crash since changing to Tsit5; closing this issue.