Use contiguous memory layout for neighbor lists

Based on #9.

For cheap artificial benchmarks like count_neighbors (see #18), the difference is huge. But it's more interesting to see real-life benchmarks, so I ran a 2D TLSPH benchmark (of interact!) on a perturbed rectangular point cloud. The difference on a single thread is <1%:

On 64 threads, we're talking about a 14% speedup for 6.5 million particles and 44% for 26 million particles.

Interestingly, there is absolutely no speedup in 3D. I'm assuming because there is more computation per particle-neighbor pair and the neighbor lists are >3x larger, so cache misses in the neighbor lists are insignificant. Note that this memory layout is also GPU-compatible, as opposed to the Vector of Vectors.

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 87.60%. Comparing base (989d0c0) to head (ac18b79).

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #10 +/- ## ========================================== + Coverage 85.02% 87.60% +2.57% ========================================== Files 9 10 +1 Lines 207 250 +43 ========================================== + Hits 176 219 +43 Misses 31 31 ``` | [Flag](https://app.codecov.io/gh/trixi-framework/PointNeighbors.jl/pull/10/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=trixi-framework) | Coverage Δ | | |---|---|---| | [unit](https://app.codecov.io/gh/trixi-framework/PointNeighbors.jl/pull/10/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=trixi-framework) | `87.60% <100.00%> (+2.57%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=trixi-framework#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

trixi-framework / PointNeighbors.jl

Use contiguous memory layout for neighbor lists #10

Codecov Report