Specialize on the function and improve performance via StaticArrays

ChrisRackauckas commented 4 years ago

The DiffEq SSAStepper was achieving the same speed as Gillespie.jl, but that's odd because the DiffEq version is a sparse formulation that should have a lot better asymptotic scaling, but should have a little bit more overhead on the low end do to not inlining the affect! function. So I had to dig in and found a few ways to speed up Gillespie.jl. Basically, you can make this package quite a bit faster by making use of static arrays.

Benchmark:

using DiffEqBiological, OrdinaryDiffEq, Random, Statistics, Test
#using Plots

sir_model = @reaction_network rn begin
  0.1/1000, s + i --> 2i
  0.01, i --> r
end
sir_prob = DiscreteProblem([999,1,0],(0.0,250.0))
sir_jump_prob = JumpProblem(sir_prob,Direct(),sir_model)

sir_sol = solve(sir_jump_prob,SSAStepper())

#using Plots; plot(sir_sol)

Random.seed!(1234)
nums = Int[]
GC.gc()
@time for i in 1:100000
  sir_sol = solve(sir_jump_prob,SSAStepper())
  push!(nums,sir_sol[end][3])
end
println("Reaction DSL: $(mean(nums))")

using Gillespie

function F(x,parms)
  (S,I,R) = x
  (beta,gamma) = parms
  infection = beta*S*I
  recovery = gamma*I
  [infection,recovery]
end

x0 = [999,1,0]
nu = [[-1 1 0];[0 -1 1]]
parms = [0.1/1000.0,0.01]
tf = 250.0
Random.seed!(1234)

result = ssa(copy(x0),F,nu,parms,tf)
data = ssa_data(result)
push!(nums,data[!,:x3][end])

nums = Int[]
GC.gc()
@time for i in 1:100000
  result = ssa(copy(x0),F,nu,parms,tf)
  data = ssa_data(result)
  push!(nums,data[!,:x3][end])
end
println("Gillespie: $(mean(nums))")

using StaticArrays

function F2(x,parms)
  (S,I,R) = x
  (beta,gamma) = parms
  infection = beta*S*I
  recovery = gamma*I
  SA[infection,recovery]
end

x0 = SA[999,1,0]
nu = SA[-1 1 0
        0 -1 1]
parms = SA[0.1/1000.0,0.01]
tf = 250.0
Random.seed!(1234)

result = ssa(copy(x0),F2,nu,parms,tf)
data = ssa_data(result)
push!(nums,data[!,:x3][end])

nums = Int[]
GC.gc()
@time for i in 1:100000
  result = ssa(copy(x0),F2,nu,parms,tf)
  data = ssa_data(result)
  push!(nums,data[!,:x3][end])
end
println("Gillespie SA: $(mean(nums))")

Results:

25.371885 seconds (164.09 M allocations: 31.951 GiB, 9.02% gc time)
Reaction DSL: 725.7405
25.931057 seconds (177.70 M allocations: 39.815 GiB, 12.76% gc time)
Gillespie: 725.53706
16.386420 seconds (15.54 M allocations: 25.301 GiB, 9.46% gc time)
Gillespie SA: 725.53706

ChrisRackauckas commented 4 years ago

@isaacsas note that with normal jumps the dynamic dispatch hit kicks in, and:

@inbounds rate = (u,p,t) -> (0.1/1000.0)*u[1]*u[2]
affect! = function (integrator)
  @inbounds integrator.u[1] -= 1
  @inbounds integrator.u[2] += 1
end
jump = ConstantRateJump(rate,affect!)

@inbounds rate = (u,p,t) -> 0.01u[2]
affect! = function (integrator)
  @inbounds integrator.u[2] -= 1
  @inbounds integrator.u[3] += 1
end
jump2 = ConstantRateJump(rate,affect!)

prob = DiscreteProblem([999,1,0],(0.0,250.0))
jump_prob = JumpProblem(prob,Direct(),jump,jump2)
sir_sol = solve(jump_prob,SSAStepper())

#using Plots; plot(sol)

nums = Int[]
GC.gc()
@time for i in 1:100000
  sol = solve(jump_prob,SSAStepper())
  push!(nums,sol[end][3])
end
println("Direct Jumps: $(mean(nums))")

31.162125 seconds (225.28 M allocations: 32.844 GiB, 7.31% gc time)
Direct Jumps: 724.74126

So this basically shows that it's the specialization on mass rate jumps that allows performance matching even with the sparse form.

coveralls commented 4 years ago

Coverage decreased (-1.6%) to 61.404% when pulling 05fa9df9ba6189fd9eba55b92b84e7d75ffdd0d0 on ChrisRackauckas:speed into 51fd982bf53d7d04bcfe771dd70cb5933763382f on sdwfrost:master.

ChrisRackauckas commented 4 years ago

With https://github.com/SciML/DiffEqJump.jl/pull/100 now the fastest one is DiffEq again:

using DiffEqBiological, OrdinaryDiffEq, Random, Statistics, Test
#using Plots

sir_model = @reaction_network rn begin
  0.1/1000, s + i --> 2i
  0.01, i --> r
end
sir_prob = DiscreteProblem(SA[999,1,0],(0.0,250.0))
sir_jump_prob = JumpProblem(sir_prob,Direct(),sir_model)

sir_sol = solve(sir_jump_prob,SSAStepper())

#using Plots; plot(sir_sol)

Random.seed!(1234)
nums = Int[]
GC.gc()
@time for i in 1:100000
  sir_sol = solve(sir_jump_prob,SSAStepper())
  push!(nums,sir_sol[end][3])
end
println("Reaction DSL: $(mean(nums))")

13.965592 seconds (1.38 M allocations: 29.879 GiB, 6.60% gc time)
Reaction DSL: 725.7405

ChrisRackauckas commented 4 years ago

The benchmarks were updated. Results on my computer:

Implementation	Time per simulation (ms)
Julia (Gillespie.jl)	1.69
Julia (Gillespie.jl, Static)	0.89
Julia (DifferentialEquations.jl)	1.14
Julia (DifferentialEquations.jl, Static)	0.72
Julia (handcoded)	0.49

Running the R next.

ChrisRackauckas commented 4 years ago

> print(paste("R:",sirsim/N*1000.0,"ms"))
[1] "R: 784.74 ms"
> print(paste("Rcpp:",sircsim/N*1000,"ms"))
[1] "Rcpp: 1.40000000000036 ms"
> print(paste("GillespieSSA:",sirssasim/N*1000,"ms"))
[1] "GillespieSSA: 463.609999999998 ms"

so the full results are:

Implementation	Time per simulation (ms)
R (GillespieSSA)	463
R (handcoded)	785
Rcpp (handcoded)	1.40
Julia (Gillespie.jl)	1.69
Julia (Gillespie.jl, Static)	0.89
Julia (DifferentialEquations.jl)	1.14
Julia (DifferentialEquations.jl, Static)	0.72
Julia (handcoded)	0.49

sdwfrost commented 4 years ago

Hi @ChrisRackauckas! I've meant to refactor some of this to avoid being all about Float64 arrays etc. for a while. In a recent example, I was still getting better performance with Gillespie.jl: see

https://github.com/epirecipes/sir-julia/blob/master/markdown/jump_process_gillespie/jump_gillespie.md

vs.

https://github.com/epirecipes/sir-julia/blob/master/markdown/jump_process/jump_process.md

Is this also a MassActionJump issue?

isaacsas commented 4 years ago

Chris can probably say more, but quickly looking at the latter there are two issues I can see. You should use MassActionJump for any normal mass action reactions. If your system has only jumps you should use the SSAStepper instead of Functionmap. The latter adds overhead by supporting many other features such as callbacks.

isaacsas commented 4 years ago

Also, DiffEqBiological will handle generating the fastest jump types that work for a given reaction for you, so you don't need to deal with specifying them individually.

ChrisRackauckas commented 4 years ago

Okay two things, the second got long sorry. Like @isaacsas mentions, SSAStepper is the best thing to use for pure SSAs. Basically, our jump system works on all of the differential equation solvers. This is a nice feature because then you can mix Gillespie with SDEs quite easily, and FunctionMap is the solver for DiscreteProblem which is a discrete dynamical system where by default the dynamical system is the identity, so by composing with the event system that gave you an SSA implementation. The nice part about this SSA implementation is that it's directly using the ODE solver components, which means that all of the event handling system is there. However, if you aren't using fancy ODE features on your SSA, then it's just overhead. So SSAStepper is something that only does the SSA features. So tl;dr: use SSAStepper unless you're doing something fancy like throwing it on an SDE. I'll update the docs to make sure that's demonstrated.

Second thing.

Yes, @sdwfrost the performance of that example you have there would be due to MassActionJump. Basically, ConstantRateJump is too flexible to be handled that efficiently. In that case we're giving the user an entire function, and they can do things like resize u or change dt etc. The reason for that was really just because I was building multi-scale models (https://github.com/SciML/MultiScaleArrays.jl) where the protein concentrations were ODEs and the jumps would cause cell birth and death, i.e. delete and add chunks of ODEs. In that case, you require the full flexibility of a function.

However, related to this PR, every function in Julia is a different type: that's how Julia is able to specialize on input functions to allow inlining and all of that (and that's the reason for https://github.com/sdwfrost/Gillespie.jl/pull/18/files#diff-767742c5471442cde4e02da73ce34a28R24, it's a feature that was added in v0.5 to make higher order functions and anonymous functions fast). So when this occured, the notable side-effect was that each effect that you can choose here is a different type, so applying the stoichiometry is type-unstable. This gives about a 100ns overhead, which if all you're doing is + and - is about 96ns of overhead. If you embed this inside of an ODE it's <1% of the time to do a step so you don't care, but if you're doing SSA on its own it's huge! Using recursion allows for this to be type-stable on the rate calculations though, so it's a pretty good option, until you try to have a tuple of 20 types and Julia's compiler starts to die from so much compile-time recursion.

One solution is then to wrap all of these functions to be the same type. This is the FunctionWrappers approach (https://github.com/yuyichao/FunctionWrappers.jl) which directly holds the function in a type that's not parameterized by the function, and then just ccall's the function pointer. This is fine but function calls have about 40ns of overhead, so it slows down the rate and the affect function by a little bit, which against isn't bad if it's in an ODE but can hurt SSA performance if all of the rates and affects are cheap, which is common.

So then there's mass action jump, which assumes a certain form (mass action laws), and then is just able to hold vectors or static arrays and know from an index how the function would've worked. This then acts more like Gillespie.jl because then you just have one function you know you're going to call, so it can inline and it is type-stable. The advantage of this over Gillespie is that it keeps the sparse form (we only evaluate exactly the rates that we need, which is then exploited in some fancy SSAs) while inlining.

The disadvantage of this route of course is that we forced that you cannot do rate(u,p,t) but instead rate = [...] stoichiometry to then interpret it for you into rate(u,p,t) so the function wouldn't exist, which is why it's MassActionJump and only supports mass actions.

We could in theory add rate(du,u,p,t) for DenseJump which is like Gillespie.jl where the user evaluates all of the rates at once, and just gives us a stoichiometry for the change. However, such a method wouldn't be compatible with the fast SSAs that use the sparsity of the graph. What we'd have to do is make it rate(du,evaluate,u,p,t) where evaluate a bunch of booleans for whether you should update those equations, but then user code needs to be like if evaluate[i] ... which would be extremely ugly. Also, it would be evaluating a ton of conditions, while for i in evaluate_idxs; mass_action_rate(i,..) only does n many things where n is the number of trues, the rate(du,evalulate,u,p,t) does length(evaluate). If the cost of a rate is similar to the cost of a condition, which means it's only a few multiplies and adds, then this is more expensive!

Curiously, this strategy has worked well because in "most" cases, the majority of jumps are mass action jumps so then even if you sprinkle in a few weird ConstantRateJumps then you're still almost optimal. Probably the only extra case we should be adding to this is Hill functions, which are a common enough rate function that they might show up more than once or twice in a list of 100 rates (they might show up often).

But all of that said (sorry this got long, but I thought I might as well write this all down for devdocs and for the other contributors in the future to read), it because very clear a few months into DiffEqBio that we needed to just have a symbolic front end because there are millions of ways to optimize the underlying data structures for this problem and so it would probably be best for the user experience if they are encouraged to be one step away from the actual functions we're using. This became DiffEqBio, and now ModelingToolkit is becoming the more flexible underbelly so we can play with a lot more ideas like how to mix in "tau leaping if the number is large enough" kinds of ideas. This is abundantly clear when you see we have RegularJump for handling tau-leaping in a fast and sparse way, though in theory we can do leaping over the normal jumps but it's just not a good idea for speed. So what we now have is a way for users to give us jumps symbolically so that way we can generate all of the forms we need for different optimized solver routines. I've been moving towards this idea of "symbolic-numeric computation", and that's the key to what ModelingToolkit.jl provides, and we're using this to do things like auto-parallelize and accelerate ODEs/DAEs/etc., but in the same way this is a symbolic-numeric problem where using the symbolic structure effectively can improve the numerics, so we don't want users to go straight to the numerics (i.e. we don't want them writing functions!)

sdwfrost commented 4 years ago

Thanks @isaacsas @ChrisRackauckas. I've updated my epirecipes/sir-julia repo accordingly. Off-topic, if you have any ideas on how to port this example to DifferentialEquations, that would be great (I get a stack overflow if I just plug in FunctionMap).

ChrisRackauckas commented 4 years ago

Taking that to https://github.com/epirecipes/sir-julia/issues/4 and https://github.com/epirecipes/sir-julia/issues/5

sdwfrost / Gillespie.jl

Specialize on the function and improve performance via StaticArrays #18