manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
164 stars 50 forks source link

Clustering with an extremely large number of particles #114

Closed manodeep closed 7 years ago

manodeep commented 7 years ago

For a very large number of particles, an script that distributes the job over MPI (perhaps with mpi4py) would be a very nice addition. The input to the script would be all of the parameters for the clustering stat wanted. The script itself should do the following:

  1. calculate the min-max x/y/z extents of the particles, including both the data and randoms (converting from spherical as appropriate)
  2. create 3-D cells of minimum size as rmax and tile the spatial domain with these cells
  3. For periodic boundary conditions, additional tiles on each dimension need to be created
  4. for each cpu, assign spatial cell co-ordinates for data and randoms (load-balancing as necessary)
  5. compute the pair-counts on each processor
  6. sum the results from each mpi task.

Reading in all the "data" points per cell is usually not a problem (other than significant io times, but the compute times is likely to be higher).

If the spatial domain is roughly fixed (as is usually the case), then the randoms can be pre-emptively divided up on the number of cpu-tasks use. Each cpu can then read in the appropriate randoms list.

manodeep commented 7 years ago

If you need this as a feature, please comment on this issue.

manodeep commented 7 years ago

Apparently I really want this feature -- see #127 :). Closing this one.