trixi-framework / Trixi.jl

Trixi.jl: Adaptive high-order numerical simulations of conservation laws in Julia
https://trixi-framework.github.io/Trixi.jl
MIT License
531 stars 106 forks source link

Boundcheck #55

Closed ranocha closed 3 years ago

ranocha commented 4 years ago

In GitLab by @ranocha on May 18, 2020, 16:25

By default, Julia checks bounds for every index operation into arrays. It is possible to tell Julia "Hey, I know this will be safe" by using @inbounds after an initial boundschek has been done by something along the lines of

@boundscheck begin
  # whatever has to be checked here...
end

It is also possible to disable bound checks entirely by calling julia --check-bounds=no. On the other hand, it is possible to enforce bound checks by calling julia --check-bounds=yes, which might be nice for tests.

Without disabling boundschecks, further optimizations such as SIMD cannot be performed in general.

An initial test starting from !43 yields

 -------------------------------------------------------------------------------
              trixi                     Time                   Allocations      
                                ----------------------   -----------------------
        Tot / % measured:            2.39s / 96.3%            350MiB / 72.2%    

 Section                ncalls     time   %tot     avg     alloc   %tot      avg
 -------------------------------------------------------------------------------
 main loop                   1    2.29s   100%   2.29s    240MiB  95.2%   240MiB
   rhs                     870    1.93s  84.1%  2.22ms   49.4MiB  19.5%  58.1KiB
     volume integral       870    1.43s  62.1%  1.64ms   37.9MiB  15.0%  44.6KiB
     prolong2surfaces      870    189ms  8.23%   218μs     0.00B  0.00%    0.00B
     surface integral      870    113ms  4.92%   130μs     0.00B  0.00%    0.00B
     surface flux          870    108ms  4.72%   125μs   5.07MiB  2.01%  5.97KiB
     Jacobian              870   48.9ms  2.13%  56.2μs     0.00B  0.00%    0.00B
     mortar flux           870   27.0ms  1.18%  31.1μs   6.36MiB  2.52%  7.48KiB
     reset ∂u/∂t           870   9.74ms  0.42%  11.2μs     0.00B  0.00%    0.00B
     prolong2mortars       870   7.06ms  0.31%  8.11μs     0.00B  0.00%    0.00B
     source terms          870    138μs  0.01%   159ns     0.00B  0.00%    0.00B
   Runge-Kutta step        870    163ms  7.11%   188μs     0.00B  0.00%    0.00B
   analyze solution          2    138ms  6.01%  69.1ms    166MiB  65.9%  83.1MiB
   calc_dt                 174   29.5ms  1.28%   169μs     0.00B  0.00%    0.00B
   I/O                      20   22.3ms  0.97%  1.12ms   24.6MiB  9.76%  1.23MiB
 mesh creation               1   7.34ms  0.32%  7.34ms   12.1MiB  4.80%  12.1MiB
   creation                  1   4.59ms  0.20%  4.59ms   9.92MiB  3.93%  9.92MiB
   initial refinement        1   2.70ms  0.12%  2.70ms   2.19MiB  0.87%  2.19MiB
   refinement patches        1   1.80μs  0.00%  1.80μs     80.0B  0.00%    80.0B
   coarsening patches        1    994ns  0.00%   994ns     80.0B  0.00%    80.0B
 read parameter file         1   1.61ms  0.07%  1.61ms   13.2KiB  0.01%  13.2KiB
 parse command line          1   5.13μs  0.00%  5.13μs      608B  0.00%     608B
 -------------------------------------------------------------------------------

for the current behavior for examples/parameters_ec_longrun.toml and

 -------------------------------------------------------------------------------
              trixi                     Time                   Allocations      
                                ----------------------   -----------------------
        Tot / % measured:            1.97s / 95.9%            350MiB / 72.2%    

 Section                ncalls     time   %tot     avg     alloc   %tot      avg
 -------------------------------------------------------------------------------
 main loop                   1    1.88s   100%   1.88s    240MiB  95.2%   240MiB
   rhs                     870    1.54s  81.4%  1.77ms   49.4MiB  19.5%  58.1KiB
     volume integral       870    1.22s  64.7%  1.40ms   37.9MiB  15.0%  44.6KiB
     prolong2surfaces      870    111ms  5.87%   128μs     0.00B  0.00%    0.00B
     surface flux          870    102ms  5.40%   117μs   5.07MiB  2.01%  5.97KiB
     surface integral      870   32.6ms  1.72%  37.5μs     0.00B  0.00%    0.00B
     mortar flux           870   31.2ms  1.65%  35.9μs   6.36MiB  2.52%  7.48KiB
     Jacobian              870   20.6ms  1.09%  23.7μs     0.00B  0.00%    0.00B
     reset ∂u/∂t           870   9.38ms  0.50%  10.8μs     0.00B  0.00%    0.00B
     prolong2mortars       870   6.90ms  0.37%  7.93μs     0.00B  0.00%    0.00B
     source terms          870    106μs  0.01%   122ns     0.00B  0.00%    0.00B
   Runge-Kutta step        870    164ms  8.67%   188μs     0.00B  0.00%    0.00B
   analyze solution          2    129ms  6.84%  64.6ms    166MiB  65.9%  83.1MiB
   calc_dt                 174   25.2ms  1.33%   145μs     0.00B  0.00%    0.00B
   I/O                      20   23.2ms  1.23%  1.16ms   24.6MiB  9.76%  1.23MiB
 mesh creation               1   6.52ms  0.34%  6.52ms   12.1MiB  4.80%  12.1MiB
   creation                  1   5.14ms  0.27%  5.14ms   9.92MiB  3.93%  9.92MiB
   initial refinement        1   1.33ms  0.07%  1.33ms   2.19MiB  0.87%  2.19MiB
   refinement patches        1   4.29μs  0.00%  4.29μs     80.0B  0.00%    80.0B
   coarsening patches        1    232ns  0.00%   232ns     80.0B  0.00%    80.0B
 read parameter file         1   1.06ms  0.06%  1.06ms   13.2KiB  0.01%  13.2KiB
 parse command line          1   3.98μs  0.00%  3.98μs      608B  0.00%     608B
 -------------------------------------------------------------------------------

with julia --check-bounds=no. That's a performance difference of ca. 20%.

ranocha commented 4 years ago

In GitLab by @ranocha on May 18, 2020, 16:29

changed the description

ranocha commented 4 years ago

In GitLab by @sloede on May 20, 2020, 10:08

So what exactly is your suggestion what we should do? So far, Gregor's and my perspective on this was that we do not want to sprinkle the code with @inbounds statements as it makes it harder to read, understand, and sometimes modify. Instead, whenever we run a performance-critical simulation, we restart Julia anyways to enable threading and can then also supply --check-bounds=no.

ranocha commented 4 years ago

In GitLab by @ranocha on May 20, 2020, 10:09

That's also okay for me (for now).

What do you mean by "restart Julia anyways to enable threading"?

ranocha commented 4 years ago

In GitLab by @sloede on May 20, 2020, 10:31

By default, I (we?) do not have JULIA_NUM_THREADS set, thus Threads.nthreads() == 1. Personally, I only enable threading when I know I have a longer-running simulation.

ranocha commented 4 years ago

In GitLab by @ranocha on May 20, 2020, 10:32

Okay, thanks for the info. I just have JULIA_NUM_THREADS set to the number of CPU cores on my machine by default.

ranocha commented 3 years ago

Superseded by #210