more efficient algorithm(s) for MPI_Comm_split

Dear developers,

I was just checking your code for MPI_Comm_split() in src/mpi/comm/comm_split.c. Unfortunately, even after 10 years it still seems to use a very old and inefficient algorithm. Please switch to one (or more) of my much more efficient algorithms published in "Parallel Sorting with Minimal Data" (presented at the EuroMPI conference in 2011).

Short summary: You're still using qsort() or insertion sort to sort all keys on all processes in step 4. However, since you already collected all color/key values previously on all processes using Allgather(), this is way too much work. qsort() needs O(n log n) comparisons on average but can be as slow as O(n^2) if you're unlucky. Insertion sort always behaves as bad as O(n^2). Fortunately, with a very simple counting algorithm (Listing 1.2 in the paper) you can reduce this immediately to O(n) comparisons in all cases. This will not only simplify your code (getting rid of those sorting implementations) but making it substantially faster.

In addition, since Bill Gropp was worrying about future scalability (e.g., in his 2010 EuroMPI paper with P. Sack), I also presented two algorithms that don't require O(n) memory but get along with as little as O(1). Furthermore, my scalable algorithm has a running time of merely O(log^2 p), making it future-proof in all aspects. Even in practice, it is ~90 times faster than your current implementation for ~300k MPI processes.

Hint: for more details regarding the scalable algorithm, you might also want to consider reading "Scalable and Efficient Parallel Selection" (PPAM 2013).

Regards

Christian Siebert

pmodels / mpich

more efficient algorithm(s) for MPI_Comm_split #5204