Store coordinates together with the points in a cell list

I realized that the FullGridCellList (at least with the Vector{Vector} backend) is algorithmically almost identical to the method of CellListMap.jl. But CellListMap.jl is slightly faster on a single thread or on multiple threads when modified to use Polyester.jl for threading.

The main difference is that @lmiq is storing the coordinates together with the point indices in the cell lists. This avoids unordered access of the big coordinate array to get the coordinates of the neighbor. I implemented a similar data structure and made it configurable, as our goal is to have a playground to try out methods.

We now get very similar performance to CellListMap.jl. Here is a plot showing the speedup against CellListMap.jl on a single thread (Threadripper 3990X):

On 128 threads, we're still slightly slower:

Here is a plot showing the speedup from using PointWithCoordinates on different architectures.

We see the largest speedups (14-15% for a WCSPH interaction on 128 threads!) on the CPU. The Nvidia H100 is also benefiting from this data structure. The RTX 3090 is only getting 0.5-1% faster. For some reason, the AMD Instinct MI210 doesn't like this data structure at all and is performing 2x slower.

trixi-framework / PointNeighbors.jl

Store coordinates together with the points in a cell list #52