Open jmsull opened 2 years ago
Starting to look at running on GPUs. After confirming I can run the readme examples on NERSC, I tried to do a simple parallelization of the boltsolve call to the ODE solver over k, following the DIffEqGPU.jl readme and the docs here.
To do this you should be able to pass the argument EnsembleGPUArray()
to the ensemble problem as I tried to do in /examples/gpu_ensemble_boltsolve.jl. This doesn't work, apparently since (as the name indicates) EnsembleGPUArray() actually needs to map over an Array, and I am trying to do it over hierarchy structs (which works fine for EnsembleThreads() syntactically, so nothing obviously wrong there). The error I get when running the commented line in the script is:
ERROR: MethodError: no method matching Array(::Hierarchy{Float64, BasicNewtonian,
Can quickly get around this by wrapping hierarchy!
and taking only k as a parameter, so will try that.
The idea I had above did not work. I updated that file (/examples/gpu_ensemble_boltsolve.jl.) to do this wrapping (which appears to work fine for the commented out EnsembleThreads() solve, by the way).
However, I get an error that looks similar to before (though now it is apparently only upset about applying "Array" to a Float32 value rather than to the hierarchy object). What this looks like is it is trying to convert the single float k into an array...as I can reproduce this method error at the top with Array(Float32(1.))
Stack trace (not sure why getting these warnings now also...):
ERROR: LoadError: MethodError: no method matching Array(::Float32) ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 Closest candidates are: Array(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY}) at /global/common/software/m4051/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/qr.jl:442 Array(::Union{LinearAlgebra.Hermitian, LinearAlgebra.Symmetric}) at /global/common/software/m4051/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/symmetric.jl:271 Array(::LinearAlgebra.SVD) at /global/common/software/m4051/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/svd.jl:571 ... Stacktrace: [1] (::DiffEqGPU.var"#24#30")(i::Int64) @ DiffEqGPU ./none:0 [2] MappingRF @ ./reduce.jl:95 [inlined] [3] _foldl_impl(op::Base.MappingRF{DiffEqGPU.var"#24#30", Base.BottomRF{typeof(hcat)}}, init::Base._InitialValue, itr::UnitRange{Int64}) @ Base ./reduce.jl:58 [4] foldl_impl @ ./reduce.jl:48 [inlined] [5] mapfoldl_impl @ ./reduce.jl:44 [inlined] [6] #mapfoldl#244 @ ./reduce.jl:162 [inlined] [7] mapfoldl @ ./reduce.jl:162 [inlined] [8] #mapreduce#248 @ ./reduce.jl:289 [inlined] [9] mapreduce @ ./reduce.jl:289 [inlined] [10] #reduce#250 @ ./reduce.jl:458 [inlined] [11] reduce @ ./reduce.jl:458 [inlined] [12] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Float32, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}}) @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:283 [13] macro expansion @ ./timing.jl:299 [inlined] [14] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Float32, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:saveat, :reltol, :dense), Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}}) @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:201 [15] #solve#42 @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:193 [inlined] [16] top-level scope @ ./timing.jl:220
After perusing the DiffEqGPU.jl github issues I think k must be passed as an Array rather than a scalar value...
Ok so after fixing this I get a new problem that occurs apparently where the solver is actually running (as opposed to just the setup failing). This seems like a problem with the solver itself? I don't actually see it calling "needs_concrete_A()"... Change pushed below, and here is the stack trace
jsull@nid002601:/pscratch/sd/j/jsull/julia/Bolt.jl/examples> julia gpu_ensemble_boltsolve.jl
Activating project at `/pscratch/sd/j/jsull/julia/Bolt.jl`
pre type of u0, Vector{Float64}
pre type of kgrid, Vector{Float64}
pre type of xi, Float64
post type of u0, Vector{Float32}
post type of kgrid, Matrix{Float32}
Succesfully evaluted gpu derivative
test deriv: nothing
one elementFloat32
ERROR: LoadError: MethodError: no method matching needs_concrete_A(::LinSolveGPUSplitFactorize)
Closest candidates are:
needs_concrete_A(::LinearSolve.AbstractFactorization) at ~/.julia/packages/LinearSolve/7vwOr/src/LinearSolve.jl:33
needs_concrete_A(::LinearSolve.AbstractKrylovSubspaceMethod) at ~/.julia/packages/LinearSolve/7vwOr/src/LinearSolve.jl:34
needs_concrete_A(::LinearSolve.AbstractSolveFunction) at ~/.julia/packages/LinearSolve/7vwOr/src/LinearSolve.jl:35
Stacktrace:
[1] build_J_W(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64, dt::Float64, f::ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, #unused#::Type{Float32}, #unused#::Val{true})
@ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/derivative_utils.jl:704
[2] build_nlsolver(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, nlalg::NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64, dt::Float64, f::Function, rate_prototype::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, #unused#::Type{Float32}, #unused#::Type{Float32}, #unused#::Type{Float64}, γ::Float64, c::Float64, α::Int64, #unused#::Val{true})
@ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/nlsolve/utils.jl:149
[3] build_nlsolver
@ ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/nlsolve/utils.jl:124 [inlined]
[4] build_nlsolver(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64, dt::Float64, f::Function, rate_prototype::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, #unused#::Type{Float32}, #unused#::Type{Float32}, #unused#::Type{Float64}, γ::Float64, c::Float64, iip::Val{true})
@ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/nlsolve/utils.jl:118
[5] alg_cache(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, rate_prototype::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, #unused#::Type{Float32}, #unused#::Type{Float32}, #unused#::Type{Float64}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev2::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, f::ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, t::Float64, dt::Float64, reltol::Float64, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, calck::Bool, #unused#::Val{true})
@ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/caches/kencarp_kvaerno_caches.jl:179
[6] __init(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float64, dtmin::Nothing, dtmax::Float64, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Float64, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Rational{Int64}, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Int64, internalnorm::typeof(DiffEqGPU.diffeqgpunorm), internalopnorm::typeof(LinearAlgebra.opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::DiffEqGPU.var"#12#18", verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:295
[7] __solve(::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, ::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}; kwargs::Base.Pairs{Symbol, Any, NTuple{6, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, typeof(DiffEqGPU.diffeqgpunorm)}}})
@ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:4
[8] solve_call(_prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, args::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}; merge_callbacks::Bool, kwargs::Base.Pairs{Symbol, Any, NTuple{6, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, typeof(DiffEqGPU.diffeqgpunorm)}}})
@ DiffEqBase ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:152
[9] solve_up(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, sensealg::Nothing, u0::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, args::KenCarp4{0, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}; kwargs::Base.Pairs{Symbol, Any, NTuple{7, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :merge_callbacks, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, Bool, typeof(DiffEqGPU.diffeqgpunorm)}}})
@ DiffEqBase ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:179
[10] #solve#40
@ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:165 [inlined]
[11] batch_solve_up(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}, u0::Matrix{Float32}, p::Matrix{Float32}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:319
[12] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:284
[13] macro expansion
@ ./timing.jl:299 [inlined]
[14] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:saveat, :reltol, :dense), Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:201
[15] #solve#42
@ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:193 [inlined]
[16] top-level scope
@ ./timing.jl:220
in expression starting at /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:65
I looked to see what I am doing differently from the readme example - I tried Tsit5
instead of KenCarp4
. Now I get an error that seems a bit more sinister...stuck on this, will leave it alone for now. Will note that Tsit5 with ensemble threads runs but aborts due to instability so probably wouldn't give good results here anyways - but it should at least run.
ERROR: LoadError: InvalidIRError: compiling kernel gpu_gpu_kernel(Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to overdub)
Stacktrace:
[1] gpu_ensemble_hierarchy!(::SubArray{Float32, 1, CuDeviceMatrix{Float32, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, ::SubArray{Float32, 1, CUDA.Const{Float32, 2, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, ::SubArray{Float32, 1, CUDA.Const{Float32, 2, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, ::Float64)
@ /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:41
[2] overdub
@ /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:41
[3] macro expansion
@ ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:20
[4] overdub
@ ~/.julia/packages/KernelAbstractions/3ZHln/src/macros.jl:80
[5] overdub
@ ~/.julia/packages/Cassette/34vIw/src/overdub.jl:0
Reason: unsupported dynamic function invocation (call to overdub)
Stacktrace:
[1] overdub
@ /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:43
[2] macro expansion
@ ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:20
[3] overdub
@ ~/.julia/packages/KernelAbstractions/3ZHln/src/macros.jl:80
[4] overdub
@ ~/.julia/packages/Cassette/34vIw/src/overdub.jl:0
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code
Stacktrace:
[1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(Cassette.overdub), Tuple{Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64}}}, args::LLVM.Module)
@ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/validation.jl:124
[2] macro expansion
@ ~/.julia/packages/GPUCompiler/1FdJy/src/driver.jl:386 [inlined]
[3] macro expansion
@ ~/.julia/packages/TimerOutputs/2IJ7m/src/TimerOutput.jl:252 [inlined]
[4] macro expansion
@ ~/.julia/packages/GPUCompiler/1FdJy/src/driver.jl:384 [inlined]
[5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
@ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/utils.jl:64
[6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
@ CUDA ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:332
[7] #260
@ ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:325 [inlined]
[8] JuliaContext(f::CUDA.var"#260#261"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(Cassette.overdub), Tuple{Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64}}}})
@ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/driver.jl:74
[9] cufunction_compile(job::GPUCompiler.CompilerJob)
@ CUDA ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:324
[10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/cache.jl:90
[11] cufunction(f::typeof(Cassette.overdub), tt::Type{Tuple{Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64}}; name::String, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:297
[12] macro expansion
@ ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:102 [inlined]
[13] (::KernelAbstractions.Kernel{CUDAKernels.CUDADevice, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(DiffEqGPU.gpu_gpu_kernel)})(::Function, ::Vararg{Any}; ndrange::Int64, dependencies::CUDAKernels.CudaEvent, workgroupsize::Int64, progress::Function)
@ CUDAKernels ~/.julia/packages/CUDAKernels/kCOA4/src/CUDAKernels.jl:194
[14] (::DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)})(du::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64)
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:414
[15] ODEFunction
@ ~/.julia/packages/SciMLBase/OHiiA/src/scimlfunctions.jl:345 [inlined]
[16] initialize!(integrator::OrdinaryDiffEq.ODEIntegrator{Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Nothing, Float64, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Float64, Float32, Float32, Float64, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, ODESolution{Float32, 3, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Nothing, Nothing, Vector{Float64}, Vector{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, OrdinaryDiffEq.InterpolationData{ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Vector{Float64}, Vector{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float64}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}}, DiffEqBase.DEStats}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float64}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, OrdinaryDiffEq.DEOptions{Float32, Float64, Float32, Float64, PIController{Rational{Int64}}, typeof(DiffEqGPU.diffeqgpunorm), typeof(LinearAlgebra.opnorm), Nothing, CallbackSet{Tuple{}, Tuple{}}, typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), DiffEqGPU.var"#12#18", DataStructures.BinaryHeap{Float64, DataStructures.FasterForward}, DataStructures.BinaryHeap{Float64, DataStructures.FasterForward}, Nothing, Nothing, Int64, Tuple{}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Tuple{}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Float32, Nothing, OrdinaryDiffEq.DefaultInit}, cache::OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float64}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False})
@ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/perform_step/low_order_rk_perform_step.jl:627
[17] __init(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float64, dtmin::Nothing, dtmax::Float64, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Float64, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Int64, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Int64, internalnorm::typeof(DiffEqGPU.diffeqgpunorm), internalopnorm::typeof(LinearAlgebra.opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::DiffEqGPU.var"#12#18", verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:456
[18] #__solve#502
@ ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:4 [inlined]
[19] #solve_call#39
@ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:152 [inlined]
[20] solve_up(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, sensealg::Nothing, u0::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, args::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}; kwargs::Base.Pairs{Symbol, Any, NTuple{7, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :merge_callbacks, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, Bool, typeof(DiffEqGPU.diffeqgpunorm)}}})
@ DiffEqBase ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:179
[21] #solve#40
@ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:165 [inlined]
[22] batch_solve_up(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}, u0::Matrix{Float32}, p::Matrix{Float32}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:319
[23] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:284
[24] macro expansion
@ ./timing.jl:299 [inlined]
[25] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:saveat, :reltol, :dense), Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:201
[26] #solve#42
@ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:193 [inlined]
[27] top-level scope
@ ./timing.jl:220
in expression starting at /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:66
I get the same error (needs_concrete_A) for the other solver we know works with Bolt - Rodas5(). It looks like we may be out of luck on this since we are using stiff solvers due to the DiffEqGPU.jl "Current limitations" section in the REDAME.jl, at least without trying something (like they show for modelingtoolkit, which I have never used) to get analytic derivatives.
It looks like GPU diffeqs aren't as push-button as we hoped. I suspect our situation is relatively complicated for an ODE problem -- I think I can write up a toy problem analog, and ask for some assistance. At the very least, we can hope for a little more transparent debugging that way.
I remember there broadly being some issue with passing a custom struct for many of the SciML packages. That's a problem since we really don't want to be passing an array of things, we depend on these interpolators for speed.
The Background works on GPU after changing the struct argument types for the quadrature weights. Array is too restrictive since CuArray <: Array = false. Verified that ForwardDiff still works through the kernel.
Added ionization history as well
Codecov Report
100.00% <ø> (+100.00%)
86.36% <ø> (-6.82%)
0.00% <0.00%> (ø)
0.00% <0.00%> (-79.27%)
0.00% <0.00%> (ø)
98.41% <0.00%> (-1.59%)
91.85% <0.00%> (+0.07%)
Continue to review full report at Codecov.