trixi-framework / Trixi.jl

Trixi.jl: Adaptive high-order numerical simulations of conservation laws in Julia
https://trixi-framework.github.io/Trixi.jl
MIT License
505 stars 98 forks source link

Avoid type instability in `SemidiscretizationCoupled` #1979

Closed efaulhaber closed 1 week ago

efaulhaber commented 2 weeks ago

Before:

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi.jl simulation finished.  Final time: 1.0368e6  Time steps: 720 (accepted), 720 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ───────────────────────────────────────────────────────────────────────────────────────
               Trixi.jl                        Time                    Allocations      
                                      ───────────────────────   ────────────────────────
           Tot / % measured:               4.75s /  98.3%           3.31GiB / 100.0%    

 Section                      ncalls     time    %tot     avg     alloc    %tot      avg
 ───────────────────────────────────────────────────────────────────────────────────────
 copy to coupled boundaries    3.60k    4.04s   86.5%  1.12ms   3.28GiB   99.3%   956KiB
 rhs!                          21.6k    406ms    8.7%  18.8μs   5.14KiB    0.0%    0.24B
   volume integral             21.6k    215ms    4.6%  10.0μs     0.00B    0.0%    0.00B
   interface flux              21.6k   64.5ms    1.4%  2.99μs     0.00B    0.0%    0.00B
   surface integral            21.6k   48.7ms    1.0%  2.25μs     0.00B    0.0%    0.00B
   boundary flux               21.6k   33.2ms    0.7%  1.54μs     0.00B    0.0%    0.00B
   Jacobian                    21.6k   22.1ms    0.5%  1.02μs     0.00B    0.0%    0.00B
   reset ∂u/∂t                 21.6k   12.2ms    0.3%   566ns     0.00B    0.0%    0.00B
   ~rhs!~                      21.6k   9.90ms    0.2%   458ns   5.14KiB    0.0%    0.24B
   source terms                21.6k    245μs    0.0%  11.3ns     0.00B    0.0%    0.00B
 I/O                              50    210ms    4.5%  4.21ms   13.7MiB    0.4%   281KiB
   save solution                 294    208ms    4.5%   709μs   12.2MiB    0.4%  42.7KiB
   ~I/O~                          50   1.81ms    0.0%  36.1μs    928KiB    0.0%  18.6KiB
   save mesh                      49    112μs    0.0%  2.28μs    602KiB    0.0%  12.3KiB
   get element variables         294    104μs    0.0%   352ns     0.00B    0.0%    0.00B
   get node variables            294   5.04μs    0.0%  17.1ns     0.00B    0.0%    0.00B
 calculate dt                    721   13.6ms    0.3%  18.8μs   9.10MiB    0.3%  12.9KiB
 ───────────────────────────────────────────────────────────────────────────────────────

Now:

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi.jl simulation finished.  Final time: 1.0368e6  Time steps: 720 (accepted), 720 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ───────────────────────────────────────────────────────────────────────────────────────
               Trixi.jl                        Time                    Allocations      
                                      ───────────────────────   ────────────────────────
           Tot / % measured:               1.53s /  95.2%            539MiB / 100.0%    

 Section                      ncalls     time    %tot     avg     alloc    %tot      avg
 ───────────────────────────────────────────────────────────────────────────────────────
 copy to coupled boundaries    3.60k    907ms   62.2%   252μs    516MiB   95.8%   147KiB
 rhs!                          21.6k    382ms   26.2%  17.7μs   5.14KiB    0.0%    0.24B
   volume integral             21.6k    206ms   14.1%  9.52μs     0.00B    0.0%    0.00B
   interface flux              21.6k   59.4ms    4.1%  2.75μs     0.00B    0.0%    0.00B
   surface integral            21.6k   46.1ms    3.2%  2.13μs     0.00B    0.0%    0.00B
   boundary flux               21.6k   30.1ms    2.1%  1.40μs     0.00B    0.0%    0.00B
   Jacobian                    21.6k   20.6ms    1.4%   954ns     0.00B    0.0%    0.00B
   reset ∂u/∂t                 21.6k   11.1ms    0.8%   515ns     0.00B    0.0%    0.00B
   ~rhs!~                      21.6k   8.50ms    0.6%   393ns   5.14KiB    0.0%    0.24B
   source terms                21.6k    250μs    0.0%  11.6ns     0.00B    0.0%    0.00B
 I/O                              50    159ms   10.9%  3.17ms   13.7MiB    2.6%   281KiB
   save solution                 294    157ms   10.8%   534μs   12.2MiB    2.3%  42.7KiB
   ~I/O~                          50   1.47ms    0.1%  29.4μs    928KiB    0.2%  18.6KiB
   save mesh                      49   79.1μs    0.0%  1.61μs    602KiB    0.1%  12.3KiB
   get element variables         294   59.9μs    0.0%   204ns     0.00B    0.0%    0.00B
   get node variables            294   5.12μs    0.0%  17.4ns     0.00B    0.0%    0.00B
 calculate dt                    721   11.7ms    0.8%  16.2μs   9.10MiB    1.7%  12.9KiB
 ───────────────────────────────────────────────────────────────────────────────────────
github-actions[bot] commented 2 weeks ago

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

Code quality

Documentation

Testing

Performance

Verification

Created with :heart: by the Trixi.jl community.

codecov[bot] commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 95.55556% with 4 lines in your changes missing coverage. Please review.

Project coverage is 96.16%. Comparing base (5398b22) to head (6c92dd1). Report is 3 commits behind head on main.

Files Patch % Lines
src/time_integration/methods_2N.jl 94.74% 2 Missing :warning:
src/time_integration/methods_3Sstar.jl 95.24% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1979 +/- ## ========================================== - Coverage 96.16% 96.16% -0.00% ========================================== Files 460 460 Lines 36958 36968 +10 ========================================== + Hits 35539 35548 +9 - Misses 1419 1420 +1 ``` | [Flag](https://app.codecov.io/gh/trixi-framework/Trixi.jl/pull/1979/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=trixi-framework) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/trixi-framework/Trixi.jl/pull/1979/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=trixi-framework) | `96.16% <95.56%> (-<0.01%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=trixi-framework#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

SimonCan commented 2 weeks ago

Looks fine to me now. Thanks for the fix.

ranocha commented 2 weeks ago

We see a huge amount of allocations in CI:

https://github.com/trixi-framework/Trixi.jl/actions/runs/9544683326/job/26303876791?pr=1979#step:7:3571

efaulhaber commented 1 week ago

Okay, so the version on main is apparently allocation-free for simulations where all semidiscretizations have the same mesh, equations, solver and cache type. Then

mesh_other, equations_other, solver_other, cache_other = mesh_equations_solver_cache(other_semi_index,
                                                                                         1,
                                                                                         semi_coupled.semis...)

is type-stable, as the types of the return values will be the same over all values of other_semi_index. In cases where we have different equations or mesh types for the semidiscretizations, this allocates and is very slow, as it allocates inside a performance-relevant function.

The version in this PR always allocates, but one level higher, where the type-instability is not as performance-relevant.

efaulhaber commented 1 week ago

Okay, new version. Now I store boundary_condition.other_semi_index statically in the type of the BoundaryConditionCoupled, removing all type instabilities and allocations.

sloede commented 1 week ago

This is unbelievably ugly but I believe that @efaulhaber needs this quickly for a teaching project. IMHO OK to merge if @ranocha signs off as well.

Just to be clear: Thanks a lot to @efaulhaber for taking the time and figuring out how to make this allocation free. Your help is really appreciated!