Save some of `d_eff` values as scalars

Alternative to #14 without introducing another array

▶️ $ julia --project -O3 --check-bounds=no examples/run_SHMIP.jl with nx = ny = 1024:

Time = 14.999 sec, T_eff = 6.60 GB/s (iterations total = 1000)

(T_eff = 3.70 GB/s in main)

What is the difference to the main branch?

In main branch version, d_eff has to be calculated nine times at each grid point in each iteration:

two times per flux (up-/downstream), multiplied by four fluxes that need to be calculated (for divergence) => eight times (this is not efficient anyway since only one of the up-/downstream values is used in each case)
once for the CFL limiter in dτ_ϕ

Five out of those nine d_eff values correspond to the grid point (ix, iy) where ϕ is being updated, thus the same calculation is done five times. The other d_eff values correspond to the neighbouring points (ix-1, iy), (ix+1, iy), (ix, iy-1), (ix, iy+1), which are called only once by a specific thread (though they are also calculated multiple times by all the neighbouring threads).

Here I calculate the central d_eff at (ix, iy) as a scalar in each grid point to at least partly reduce the redundancy: https://github.com/pohlan/SheetModel.jl/blob/b2616d580fa61721b2211510b9b859e02fcd9177/src/modelonly.jl#L163-L166 This requires the definition of two different flux_x and flux_y macros: https://github.com/pohlan/SheetModel.jl/blob/b2616d580fa61721b2211510b9b859e02fcd9177/src/modelonly.jl#L86-L91

pohlan / SheetModel.jl

Save some of `d_eff` values as scalars #17

What is the difference to the main branch?

Conclusion