trixi-framework / PointNeighbors.jl

PointNeighbors.jl: Neighborhood search with fixed search radius in Julia
https://trixi-framework.github.io/PointNeighbors.jl/
MIT License
10 stars 1 forks source link

Make `foreach_neighbor` loop run on AMD GPUs #49

Closed efaulhaber closed 6 days ago

efaulhaber commented 1 week ago

The Iterators.flatten stuff does not work inside AMD GPU kernels.

This version is also faster than the original code with Iterators.flatten on the CPU:

grafik

This is a speedup of ~20% for the ultra cheap count neighbors benchmark.

For an actual WCSPH simulation, the new code is faster for small problems, but the difference disappears as the problem becomes larger:

grafik
julia> plot_benchmarks(benchmark_wcsph, (7.37, 7.37, 7.37), 9, title="WCSPH 3D")
original code
with 7x7x7 = 343 particles finished in 44.701 μs

new code
with 7x7x7 = 343 particles finished in 36.191 μs

original code
with 12x12x12 = 1728 particles finished in 233.506 μs

new code
with 12x12x12 = 1728 particles finished in 230.556 μs

original code
with 19x19x19 = 6859 particles finished in 1.116 ms

new code
with 19x19x19 = 6859 particles finished in 984.332 μs

original code
with 29x29x29 = 24389 particles finished in 3.519 ms

new code
with 29x29x29 = 24389 particles finished in 3.364 ms

original code
with 47x47x47 = 103823 particles finished in 15.201 ms

new code
with 47x47x47 = 103823 particles finished in 15.154 ms

original code
with 74x74x74 = 405224 particles finished in 72.060 ms

new code
with 74x74x74 = 405224 particles finished in 62.068 ms

original code
with 118x118x118 = 1643032 particles finished in 266.734 ms

new code
with 118x118x118 = 1643032 particles finished in 266.295 ms

original code
with 187x187x187 = 6539203 particles finished in 1.091 s

new code
with 187x187x187 = 6539203 particles finished in 1.082 s

original code
with 297x297x297 = 26198073 particles finished in 4.369 s

new code
with 297x297x297 = 26198073 particles finished in 4.375 s
codecov[bot] commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 89.81%. Comparing base (768c62a) to head (ebd89fe).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #49 +/- ## ========================================== + Coverage 88.60% 89.81% +1.20% ========================================== Files 16 16 Lines 474 481 +7 ========================================== + Hits 420 432 +12 + Misses 54 49 -5 ``` | [Flag](https://app.codecov.io/gh/trixi-framework/PointNeighbors.jl/pull/49/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=trixi-framework) | Coverage Δ | | |---|---|---| | [unit](https://app.codecov.io/gh/trixi-framework/PointNeighbors.jl/pull/49/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=trixi-framework) | `89.81% <100.00%> (+1.20%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=trixi-framework#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.