xzackli / Bolt.jl

differentiable boltzmann code
MIT License
43 stars 5 forks source link

GPU testing #61

Open jmsull opened 2 years ago

codecov-commenter commented 2 years ago

Codecov Report

Merging #61 (b761078) into main (d97d7c0) will decrease coverage by 25.04%. The diff coverage is 0.00%.

:exclamation: Current head b761078 differs from pull request most recent head 152f307. Consider uploading reports for the commit 152f307 to get more accurate results

@@             Coverage Diff             @@
##             main      #61       +/-   ##
===========================================
- Coverage   70.74%   45.70%   -25.05%     
===========================================
  Files           7        8        +1     
  Lines         694      698        +4     
===========================================
- Hits          491      319      -172     
- Misses        203      379      +176     
Impacted Files Coverage Δ
src/Bolt.jl 100.00% <ø> (+100.00%) :arrow_up:
src/background.jl 86.36% <ø> (-6.82%) :arrow_down:
src/gpu.jl 0.00% <0.00%> (ø)
src/perturbations.jl 0.00% <0.00%> (-79.27%) :arrow_down:
src/spectra.jl 0.00% <0.00%> (ø)
src/util.jl 98.41% <0.00%> (-1.59%) :arrow_down:
src/ionization/recfast.jl 91.85% <0.00%> (+0.07%) :arrow_up:
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d97d7c0...152f307. Read the comment docs.

jmsull commented 2 years ago

Starting to look at running on GPUs. After confirming I can run the readme examples on NERSC, I tried to do a simple parallelization of the boltsolve call to the ODE solver over k, following the DIffEqGPU.jl readme and the docs here.

To do this you should be able to pass the argument EnsembleGPUArray() to the ensemble problem as I tried to do in /examples/gpu_ensemble_boltsolve.jl. This doesn't work, apparently since (as the name indicates) EnsembleGPUArray() actually needs to map over an Array, and I am trying to do it over hierarchy structs (which works fine for EnsembleThreads() syntactically, so nothing obviously wrong there). The error I get when running the commented line in the script is:

ERROR: MethodError: no method matching Array(::Hierarchy{Float64, BasicNewtonian,

Can quickly get around this by wrapping hierarchy! and taking only k as a parameter, so will try that.

jmsull commented 2 years ago

The idea I had above did not work. I updated that file (/examples/gpu_ensemble_boltsolve.jl.) to do this wrapping (which appears to work fine for the commented out EnsembleThreads() solve, by the way).

However, I get an error that looks similar to before (though now it is apparently only upset about applying "Array" to a Float32 value rather than to the hierarchy object). What this looks like is it is trying to convert the single float k into an array...as I can reproduce this method error at the top with Array(Float32(1.))

Stack trace (not sure why getting these warnings now also...): ERROR: LoadError: MethodError: no method matching Array(::Float32) ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 Closest candidates are: Array(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY}) at /global/common/software/m4051/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/qr.jl:442 Array(::Union{LinearAlgebra.Hermitian, LinearAlgebra.Symmetric}) at /global/common/software/m4051/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/symmetric.jl:271 Array(::LinearAlgebra.SVD) at /global/common/software/m4051/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/svd.jl:571 ... Stacktrace: [1] (::DiffEqGPU.var"#24#30")(i::Int64) @ DiffEqGPU ./none:0 [2] MappingRF @ ./reduce.jl:95 [inlined] [3] _foldl_impl(op::Base.MappingRF{DiffEqGPU.var"#24#30", Base.BottomRF{typeof(hcat)}}, init::Base._InitialValue, itr::UnitRange{Int64}) @ Base ./reduce.jl:58 [4] foldl_impl @ ./reduce.jl:48 [inlined] [5] mapfoldl_impl @ ./reduce.jl:44 [inlined] [6] #mapfoldl#244 @ ./reduce.jl:162 [inlined] [7] mapfoldl @ ./reduce.jl:162 [inlined] [8] #mapreduce#248 @ ./reduce.jl:289 [inlined] [9] mapreduce @ ./reduce.jl:289 [inlined] [10] #reduce#250 @ ./reduce.jl:458 [inlined] [11] reduce @ ./reduce.jl:458 [inlined] [12] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Float32, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}}) @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:283 [13] macro expansion @ ./timing.jl:299 [inlined] [14] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Float32, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:saveat, :reltol, :dense), Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}}) @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:201 [15] #solve#42 @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:193 [inlined] [16] top-level scope @ ./timing.jl:220

jmsull commented 2 years ago

After perusing the DiffEqGPU.jl github issues I think k must be passed as an Array rather than a scalar value...

jmsull commented 2 years ago

Ok so after fixing this I get a new problem that occurs apparently where the solver is actually running (as opposed to just the setup failing). This seems like a problem with the solver itself? I don't actually see it calling "needs_concrete_A()"... Change pushed below, and here is the stack trace

jsull@nid002601:/pscratch/sd/j/jsull/julia/Bolt.jl/examples> julia gpu_ensemble_boltsolve.jl
  Activating project at `/pscratch/sd/j/jsull/julia/Bolt.jl`
pre type of u0, Vector{Float64}
pre type of kgrid, Vector{Float64}
pre type of xi, Float64
post type of u0, Vector{Float32}
post type of kgrid, Matrix{Float32}
Succesfully evaluted gpu derivative
test deriv: nothing
one elementFloat32
ERROR: LoadError: MethodError: no method matching needs_concrete_A(::LinSolveGPUSplitFactorize)
Closest candidates are:
  needs_concrete_A(::LinearSolve.AbstractFactorization) at ~/.julia/packages/LinearSolve/7vwOr/src/LinearSolve.jl:33
  needs_concrete_A(::LinearSolve.AbstractKrylovSubspaceMethod) at ~/.julia/packages/LinearSolve/7vwOr/src/LinearSolve.jl:34
  needs_concrete_A(::LinearSolve.AbstractSolveFunction) at ~/.julia/packages/LinearSolve/7vwOr/src/LinearSolve.jl:35
Stacktrace:
  [1] build_J_W(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64, dt::Float64, f::ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, #unused#::Type{Float32}, #unused#::Val{true})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/derivative_utils.jl:704
  [2] build_nlsolver(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, nlalg::NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64, dt::Float64, f::Function, rate_prototype::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, #unused#::Type{Float32}, #unused#::Type{Float32}, #unused#::Type{Float64}, γ::Float64, c::Float64, α::Int64, #unused#::Val{true})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/nlsolve/utils.jl:149
  [3] build_nlsolver
    @ ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/nlsolve/utils.jl:124 [inlined]
  [4] build_nlsolver(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64, dt::Float64, f::Function, rate_prototype::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, #unused#::Type{Float32}, #unused#::Type{Float32}, #unused#::Type{Float64}, γ::Float64, c::Float64, iip::Val{true})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/nlsolve/utils.jl:118
  [5] alg_cache(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, rate_prototype::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, #unused#::Type{Float32}, #unused#::Type{Float32}, #unused#::Type{Float64}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev2::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, f::ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, t::Float64, dt::Float64, reltol::Float64, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, calck::Bool, #unused#::Val{true})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/caches/kencarp_kvaerno_caches.jl:179
  [6] __init(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float64, dtmin::Nothing, dtmax::Float64, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Float64, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Rational{Int64}, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Int64, internalnorm::typeof(DiffEqGPU.diffeqgpunorm), internalopnorm::typeof(LinearAlgebra.opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::DiffEqGPU.var"#12#18", verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:295
  [7] __solve(::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, ::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}; kwargs::Base.Pairs{Symbol, Any, NTuple{6, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:4
  [8] solve_call(_prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, args::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}; merge_callbacks::Bool, kwargs::Base.Pairs{Symbol, Any, NTuple{6, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:152
  [9] solve_up(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, sensealg::Nothing, u0::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, args::KenCarp4{0, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}; kwargs::Base.Pairs{Symbol, Any, NTuple{7, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :merge_callbacks, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, Bool, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:179
 [10] #solve#40
    @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:165 [inlined]
 [11] batch_solve_up(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}, u0::Matrix{Float32}, p::Matrix{Float32}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:319
 [12] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:284
 [13] macro expansion
    @ ./timing.jl:299 [inlined]
 [14] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:saveat, :reltol, :dense), Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:201
 [15] #solve#42
    @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:193 [inlined]
 [16] top-level scope
    @ ./timing.jl:220
in expression starting at /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:65
jmsull commented 2 years ago

I looked to see what I am doing differently from the readme example - I tried Tsit5 instead of KenCarp4. Now I get an error that seems a bit more sinister...stuck on this, will leave it alone for now. Will note that Tsit5 with ensemble threads runs but aborts due to instability so probably wouldn't give good results here anyways - but it should at least run.

ERROR: LoadError: InvalidIRError: compiling kernel gpu_gpu_kernel(Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to overdub)
Stacktrace:
 [1] gpu_ensemble_hierarchy!(::SubArray{Float32, 1, CuDeviceMatrix{Float32, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, ::SubArray{Float32, 1, CUDA.Const{Float32, 2, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, ::SubArray{Float32, 1, CUDA.Const{Float32, 2, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, ::Float64)
   @ /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:41
 [2] overdub
   @ /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:41
 [3] macro expansion
   @ ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:20
 [4] overdub
   @ ~/.julia/packages/KernelAbstractions/3ZHln/src/macros.jl:80
 [5] overdub
   @ ~/.julia/packages/Cassette/34vIw/src/overdub.jl:0
Reason: unsupported dynamic function invocation (call to overdub)
Stacktrace:
 [1] overdub
   @ /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:43
 [2] macro expansion
   @ ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:20
 [3] overdub
   @ ~/.julia/packages/KernelAbstractions/3ZHln/src/macros.jl:80
 [4] overdub
   @ ~/.julia/packages/Cassette/34vIw/src/overdub.jl:0
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(Cassette.overdub), Tuple{Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/validation.jl:124
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/1FdJy/src/driver.jl:386 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/2IJ7m/src/TimerOutput.jl:252 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/1FdJy/src/driver.jl:384 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/utils.jl:64
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:332
  [7] #260
    @ ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:325 [inlined]
  [8] JuliaContext(f::CUDA.var"#260#261"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(Cassette.overdub), Tuple{Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/driver.jl:74
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:324
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/cache.jl:90
 [11] cufunction(f::typeof(Cassette.overdub), tt::Type{Tuple{Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64}}; name::String, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:297
 [12] macro expansion
    @ ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:102 [inlined]
 [13] (::KernelAbstractions.Kernel{CUDAKernels.CUDADevice, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(DiffEqGPU.gpu_gpu_kernel)})(::Function, ::Vararg{Any}; ndrange::Int64, dependencies::CUDAKernels.CudaEvent, workgroupsize::Int64, progress::Function)
    @ CUDAKernels ~/.julia/packages/CUDAKernels/kCOA4/src/CUDAKernels.jl:194
 [14] (::DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)})(du::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64)
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:414
 [15] ODEFunction
    @ ~/.julia/packages/SciMLBase/OHiiA/src/scimlfunctions.jl:345 [inlined]
 [16] initialize!(integrator::OrdinaryDiffEq.ODEIntegrator{Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Nothing, Float64, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Float64, Float32, Float32, Float64, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, ODESolution{Float32, 3, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Nothing, Nothing, Vector{Float64}, Vector{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, OrdinaryDiffEq.InterpolationData{ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Vector{Float64}, Vector{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float64}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}}, DiffEqBase.DEStats}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float64}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, OrdinaryDiffEq.DEOptions{Float32, Float64, Float32, Float64, PIController{Rational{Int64}}, typeof(DiffEqGPU.diffeqgpunorm), typeof(LinearAlgebra.opnorm), Nothing, CallbackSet{Tuple{}, Tuple{}}, typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), DiffEqGPU.var"#12#18", DataStructures.BinaryHeap{Float64, DataStructures.FasterForward}, DataStructures.BinaryHeap{Float64, DataStructures.FasterForward}, Nothing, Nothing, Int64, Tuple{}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Tuple{}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Float32, Nothing, OrdinaryDiffEq.DefaultInit}, cache::OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float64}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/perform_step/low_order_rk_perform_step.jl:627
 [17] __init(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float64, dtmin::Nothing, dtmax::Float64, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Float64, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Int64, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Int64, internalnorm::typeof(DiffEqGPU.diffeqgpunorm), internalopnorm::typeof(LinearAlgebra.opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::DiffEqGPU.var"#12#18", verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:456
 [18] #__solve#502
    @ ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:4 [inlined]
 [19] #solve_call#39
    @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:152 [inlined]
 [20] solve_up(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, sensealg::Nothing, u0::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, args::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}; kwargs::Base.Pairs{Symbol, Any, NTuple{7, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :merge_callbacks, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, Bool, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:179
 [21] #solve#40
    @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:165 [inlined]
 [22] batch_solve_up(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}, u0::Matrix{Float32}, p::Matrix{Float32}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:319
 [23] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:284
 [24] macro expansion
    @ ./timing.jl:299 [inlined]
 [25] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:saveat, :reltol, :dense), Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:201
 [26] #solve#42
    @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:193 [inlined]
 [27] top-level scope
    @ ./timing.jl:220
in expression starting at /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:66
jmsull commented 2 years ago

I get the same error (needs_concrete_A) for the other solver we know works with Bolt - Rodas5(). It looks like we may be out of luck on this since we are using stiff solvers due to the DiffEqGPU.jl "Current limitations" section in the REDAME.jl, at least without trying something (like they show for modelingtoolkit, which I have never used) to get analytic derivatives.

xzackli commented 2 years ago

It looks like GPU diffeqs aren't as push-button as we hoped. I suspect our situation is relatively complicated for an ODE problem -- I think I can write up a toy problem analog, and ask for some assistance. At the very least, we can hope for a little more transparent debugging that way.

I remember there broadly being some issue with passing a custom struct for many of the SciML packages. That's a problem since we really don't want to be passing an array of things, we depend on these interpolators for speed.

jmsull commented 2 years ago

The Background works on GPU after changing the struct argument types for the quadrature weights. Array is too restrictive since CuArray <: Array = false. Verified that ForwardDiff still works through the kernel.

jmsull commented 2 years ago

Added ionization history as well