For a very large number of particles, an script that distributes the job over MPI (perhaps with mpi4py) would be a very nice addition. The input to the script would be all of the parameters for the clustering stat wanted. The script itself should do the following:
calculate the min-max x/y/z extents of the particles, including both
the data and randoms (converting from spherical as appropriate)
create 3-D cells of minimum size as rmax and tile the spatial domain with these cells
For periodic boundary conditions, additional tiles on each dimension need to be created
for each cpu, assign spatial cell co-ordinates for data and randoms
(load-balancing as necessary)
compute the pair-counts on each processor
sum the results from each mpi task.
Reading in all the "data" points per cell is usually not a problem (other than significant io times, but the compute times is likely to be higher).
If the spatial domain is roughly fixed (as is usually the case), then the randoms can be pre-emptively divided up on the number of cpu-tasks use. Each cpu can then read in the appropriate randoms list.
For a very large number of particles, an script that distributes the job over
MPI
(perhaps withmpi4py
) would be a very nice addition. The input to the script would be all of the parameters for the clustering stat wanted. The script itself should do the following:rmax
and tile the spatial domain with these cellsReading in all the "data" points per cell is usually not a problem (other than significant io times, but the compute times is likely to be higher).
If the spatial domain is roughly fixed (as is usually the case), then the randoms can be pre-emptively divided up on the number of cpu-tasks use. Each cpu can then read in the appropriate randoms list.