trixi-framework / Trixi.jl

Trixi.jl: Adaptive high-order numerical simulations of conservation laws in Julia
https://trixi-framework.github.io/Trixi.jl
MIT License
505 stars 98 forks source link

Allocations for `SummaryCallback` with custom integrators #1877

Open DanielDoehring opened 3 months ago

DanielDoehring commented 3 months ago

We observe allocations for the custom integrators, see

https://github.com/trixi-framework/Trixi.jl/blob/2dfde7faf3cc74f066d86148ae6c99ed9e58fa79/test/test_structured_2d.jl#L762-L774

https://github.com/trixi-framework/Trixi.jl/blob/2dfde7faf3cc74f066d86148ae6c99ed9e58fa79/test/test_tree_2d_advection.jl#L204-L215

and in the current PR #1871

https://github.com/trixi-framework/Trixi.jl/pull/1871/files#diff-4e3f4096536ca3da518b2f02b6f9197bc83ed9a7aec477b329951667dc1f830c

I observe that the allocations seem to be caused by the SummaryCallback.

For instance, for the example tree_1d_dgsem/elixir_hypdiff_nonperiodic.jl

the alloc check

t = sol.t[end]
u_ode = sol.u[end]
du_ode = similar(u_ode)
@allocated Trixi.rhs!(du_ode, u_ode, semi, t) # 7520

gives 7520 (as for the PERK2 test).

If one supplies the callbacks

callbacks = CallbackSet(#summary_callback, 
                        steady_state_callback,
                        analysis_callback, alive_callback,
                        save_solution,
                        stepsize_callback)

instead of

callbacks = CallbackSet(summary_callback, 
                        steady_state_callback,
                        analysis_callback, alive_callback,
                        save_solution,
                        stepsize_callback)

one obtains 0.

DanielDoehring commented 3 months ago

Introducing a @trixi_timeit timer() around the callbacks

        @trixi_timeit timer() "Step-Callbacks" begin
            # handle callbacks
            if callbacks isa CallbackSet
                foreach(callbacks.discrete_callbacks) do cb
                    if cb.condition(integrator.u, integrator.t, integrator)
                        cb.affect!(integrator)
                    end
                    return nothing
                end
            end
        end

one obtains for the example mentioned above:

 ────────────────────────────────────────────────────────────────────────────────────────
                Trixi.jl                        Time                    Allocations      
                                       ───────────────────────   ────────────────────────
           Tot / % measured:               6.04ms /  97.8%            347KiB /  97.3%    

 Section                       ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────────
 main loop                          1   5.02ms   85.0%  5.02ms    303KiB   89.7%   303KiB
   Step-Callbacks                 544   2.77ms   46.8%  5.09μs    294KiB   87.1%     554B
     I/O                            6   1.61ms   27.2%   268μs   54.3KiB   16.0%  9.04KiB
       save solution                6   1.60ms   27.1%   266μs   43.8KiB   13.0%  7.30KiB
       ~I/O~                        6   7.98μs    0.1%  1.33μs   10.4KiB    3.1%  1.74KiB
       get element variables        6    226ns    0.0%  37.7ns     0.00B    0.0%    0.00B
       save mesh                    6    167ns    0.0%  27.8ns     0.00B    0.0%    0.00B
       get node variables           6    111ns    0.0%  18.5ns     0.00B    0.0%    0.00B
     ~Step-Callbacks~             544    639μs   10.8%  1.18μs    146KiB   43.1%     275B
     analyze solution               6    504μs    8.5%  84.0μs   94.2KiB   27.9%  15.7KiB
     calculate dt                 544   15.8μs    0.3%  29.0ns     0.00B    0.0%    0.00B
   rhs!                         2.72k   1.77ms   30.0%   651ns   6.61KiB    2.0%    2.49B
     ~rhs!~                     2.72k    982μs   16.6%   361ns   6.61KiB    2.0%    2.49B
     volume integral            2.72k    174μs    2.9%  64.0ns     0.00B    0.0%    0.00B
     source terms               2.72k    114μs    1.9%  41.9ns     0.00B    0.0%    0.00B
     boundary flux              2.72k    108μs    1.8%  39.7ns     0.00B    0.0%    0.00B
     interface flux             2.72k   82.2μs    1.4%  30.2ns     0.00B    0.0%    0.00B
     prolong2interfaces         2.72k   71.7μs    1.2%  26.4ns     0.00B    0.0%    0.00B
     prolong2boundaries         2.72k   67.2μs    1.1%  24.7ns     0.00B    0.0%    0.00B
     surface integral           2.72k   62.4μs    1.1%  22.9ns     0.00B    0.0%    0.00B
     Jacobian                   2.72k   61.9μs    1.0%  22.7ns     0.00B    0.0%    0.00B
     reset ∂u/∂t                2.72k   48.0μs    0.8%  17.6ns     0.00B    0.0%    0.00B
   ~main loop~                      1    324μs    5.5%   324μs   2.20KiB    0.7%  2.20KiB
   Runge-Kutta step             2.72k    155μs    2.6%  57.1ns     0.00B    0.0%    0.00B
 I/O                                2    783μs   13.3%   392μs   19.2KiB    5.7%  9.60KiB
   ~I/O~                            2    466μs    7.9%   233μs   11.9KiB    3.5%  5.95KiB
   save solution                    1    317μs    5.4%   317μs   7.30KiB    2.2%  7.30KiB
   get element variables            1    268ns    0.0%   268ns     0.00B    0.0%    0.00B
   save mesh                        1   28.0ns    0.0%  28.0ns     0.00B    0.0%    0.00B
   get node variables               1   18.0ns    0.0%  18.0ns     0.00B    0.0%    0.00B
 analyze solution                   1    105μs    1.8%   105μs   15.8KiB    4.7%  15.8KiB
 calculate dt                       1    302ns    0.0%   302ns     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────────

Any ideas why we have this for the custom integrators, but not for the ones from OrdinaryDiffEq.jl @ranocha ?

ranocha commented 3 months ago

Can you check with @code_warntype or Cthulhu.jl whether there are any type instabilities?

DanielDoehring commented 3 months ago

Hm, output for

@code_warntype summary_callback.condition(integrator.u, integrator.t, integrator)

@code_warntype summary_callback.affect!(integrator)

is all blue.

But from adding some additional @trixi_timeit timer() in the callback sections I suppose that there is a type-instability in cb.condition(integrator.u, integrator.t, integrator) which we take from OrdinaryDiffEq.jl, right?

ranocha commented 3 months ago

Can you go one level higher with Cthulhu.jl and check where the type instability is introduced?

DanielDoehring commented 3 months ago

Hm, unfortunately this gives also no new information compared to @code_warntype.