trixi-framework / Trixi.jl

Trixi.jl: Adaptive high-order numerical simulations of conservation laws in Julia
https://trixi-framework.github.io/Trixi.jl
MIT License
523 stars 102 forks source link

Add location when Trixi fails #1055

Open andrewwinters5000 opened 2 years ago

andrewwinters5000 commented 2 years ago

In the event when Trixi fails due to something like a negative density or pressure it would be helpful to add additional output that provides the (x,y,z) location where the failing solution state occurs and what specifically fails.

sloede commented 2 years ago

I think this is a great idea for debugging! However, it is also likely going to be quite expensive. Thus maybe we could think about a SanityCallback that does these kinds of checks at certain intervals and spills the beans in case of bad values?

ranocha commented 2 years ago

I agree with you both - it would be juice but also quite expensive (for example, since we do not pass the spatial location to the numerical fluxes). A workaround right now might be as follows. You should get a stack trace pointing you to the problematic call. Then, you could try to figure out the internal state used there. However, you need to use something like

integrator = init(arguments, you, would, use, for, solve)
solve!(integrator)

Then, you can dig jn the internals of the integrator and check the last stage.

andrewwinters5000 commented 2 years ago

I figured this would be expensive and would likely only be used for debugging purposes. I like the idea of a callback but see that dissecting the stage information from solve might be cumbersome. Maybe we could start by adding this functionality to the self-implemented CarpenterKennedy2N to determine the best way to do it and then expand to the other solvers?

ranocha commented 2 years ago

Remember: It would need to be a stage callback

sloede commented 2 years ago

Remember: It would need to be a stage callback

It depends on what you are looking for. Do you really want to find the very first occurrence of any insane value? Or do you just want to pick it up in time before the entire simulation crashes? In the latter case, a step callback might do it as well.

Here's what I had in mind (but maybe I'm forgetting something): Create a stage (or step) callback that routinely iterates through all elements and all nodes and verifies that is_sane (or is_valid etc.) is true for each set up conserved variables. If not, figure out the geometric position, node, and element ids (for later reference), and finally print out all bad locations with extended info on where it happened, before calling it quits in an orderly fashion.

ranocha commented 2 years ago

Well, the simulation will crash immediately - if you look for negative density, you need to do this before each RHS call

ranocha commented 2 years ago

What's the cost of using a try ... catch block around our basic rhs! call?

ranocha commented 2 years ago

For now, you can use a debug workaround such as

julia> using Trixi, OrdinaryDiffEq

julia> trixi_include(joinpath(examples_dir(), "tree_1d_dgsem", "elixir_euler_density_wave.jl"),
           solver=DGSEM(polydeg=5, surface_flux=flux_chandrashekar,
                        volume_integral=VolumeIntegralFluxDifferencing(flux_chandrashekar)),
           tspan=(0.0, 100.0))

ERROR: LoadError: DomainError with -13.722976495571851:
log will only return a complex result if called with a complex argument. Try log(Complex(x)).
Stacktrace:
  [1] throw_complex_domainerror(f::Symbol, x::Float64)
    @ Base.Math ./math.jl:33
  [2] _log(x::Float64, base::Val{:ℯ}, func::Symbol)
[...]

julia> function unstable_check(checks, u_ode, semi, t)
         u = Trixi.wrap_array(u_ode, semi)
         unstable_check(checks, u, t, Trixi.mesh_equations_solver_cache(semi)...)
       end

julia> function unstable_check(checks, u, t, mesh::Trixi.TreeMesh1D, equations, dg::DG, cache)
         for element in eachelement(dg, cache) # should be `for idx in each_dof_global` or so
           for i in eachnode(dg)
             u_node = Trixi.get_node_vars(u, equations, dg, i, element)
             for check in checks
               if check(u_node, equations) == false
                 x_node = Trixi.get_node_coords(cache.elements.node_coordinates, equations, dg, i, element)
                 @error "Check failed" check u_node x_node
               end
             end
           end
         end

         return nothing
       end

julia> function debug_rhs!(du_ode, u_ode, semi, t)
           try
               Trixi.rhs!(du_ode, u_ode, semi, t)
           catch e
               unstable_check(
                   ((u, equations) -> Trixi.density(u, equations) >= 0,
                    (u, equations) -> pressure(u, equations) >= 0),
                    u_ode, semi, t)
                throw(e)
           end
       end

julia> debug_ode = ODEProblem(debug_rhs!, ode.u0, ode.tspan, ode.p)

julia> sol = solve(debug_ode, SSPRK43(), save_everystep=false)
┌ Error: Check failed
│   check = #5 (generic function with 1 method)
│   u_node =
│    3-element SVector{3, Float64} with indices SOneTo(3):
│     -9.979081056693158e-7
│      0.0015790946168858322
│     15.095183159793354
│   x_node =
│    1-element SVector{1, Float64} with indices SOneTo(1):
│     1.0
└ @ Main REPL[19]:8
ERROR: DomainError with -4.167360551827564e-7:
log will only return a complex result if called with a complex argument. Try log(Complex(x)).
Stacktrace:
  [1] debug_rhs!(du_ode::Vector{Float64}, u_ode::Vector{Float64}, semi::SemidiscretizationHyperbolic{TreeMesh{1, Trixi.SerialTree{1}}, CompressibleEulerEquations1D{Float64}, typeof(initial_condition_density_wave), Trixi.BoundaryConditionPeriodic, Nothing, DGSEM{LobattoLegendreBasis{Float64, 6, SVector{6, Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}}, Trixi.LobattoLegendreMortarL2{Float64, 6, Matrix{Float64}, Matrix{Float64}}, SurfaceIntegralWeakForm{typeof(flux_chandrashekar)}, VolumeIntegralFluxDifferencing{typeof(flux_chandrashekar)}}, NamedTuple{(:elements, :interfaces, :boundaries), Tuple{Trixi.ElementContainer1D{Float64, Float64}, Trixi.InterfaceContainer1D{Float64}, Trixi.BoundaryContainer1D{Float64, Float64}}}}, t::Float64)
    @ Main ./REPL[29]:9
[...]
andrewwinters5000 commented 2 years ago

Thinking about it more it might also be good to print a solution h5 file once such an abort is encountered. Then the user can also visualize what is going on.

andrewwinters5000 commented 2 years ago

Here is an augmented version of the suggested workaround given above by @ranocha; however, it is changed to operate with P4estMesh{2}

function unstable_check(checks, u_ode, semi, t)
  u = Trixi.wrap_array(u_ode, semi)
  unstable_check(checks, u, t, Trixi.mesh_equations_solver_cache(semi)...)
end

function unstable_check(checks, u, t, mesh::Trixi.P4estMesh{2}, equations, dg::DG, cache)
  for element in eachelement(dg, cache) # should be `for idx in each_dof_global` or so
    for j in eachnode(dg), i in eachnode(dg)
      u_node = Trixi.get_node_vars(u, equations, dg, i, j, element)
      for check in checks
        if check(u_node, equations) == false
          x_node = Trixi.get_node_coords(cache.elements.node_coordinates, equations, dg, i, j, element)
          @error "Check failed" check u_node x_node
        end
      end
    end
  end

  return nothing
end

function debug_rhs!(du_ode, u_ode, semi, t)
  try
      Trixi.rhs!(du_ode, u_ode, semi, t)
  catch e
      unstable_check(
          ((u, equations) -> Trixi.density(u, equations) >= 0,
           (u, equations) -> pressure(u, equations) >= 0),
           u_ode, semi, t)
       throw(e)
  end
end

debug_ode = ODEProblem(debug_rhs!, ode.u0, ode.tspan, ode.p)