manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
163 stars 50 forks source link

Gridlink slow for large datasets with copy_particles=False #187

Closed lgarrison closed 3 years ago

lgarrison commented 5 years ago

The new option to set copy_particles=False in v2.3 is slow in gridlink! There's a single-threaded sort of particles into cell order. For a 1 billion particle dataset, this takes 760 seconds instead of 50 seconds if copy_particles=True.

I think we can do better than the sort: we've already constructed the cell indices of the particles; we just need to place them there. The out-of-place algorithm is straightforward (it's what we're already doing when copy_particles=True); the in-place algorithm needs some more thought but should be possible.

manodeep commented 5 years ago

Agreed - we already have the necessary bits for a counting sort. I found this medium article to be very helpful with understanding counting sort.

We already have the count-array -- the lattice structure itself. We would have to create a temporary copy to perform the sort, but then we can free that after the sorting into cell-order is complete.

manodeep commented 5 years ago

We should be able to adapt this into C

manodeep commented 5 years ago

The counting sort, while fast, requires a (temporary) copy of the entire memory area being sorted. May be we should stick with a different sort.

lgarrison commented 3 years ago

Superseded by #240.