precompute distances for faster merging

This PR speeds up the merging operation, which combines similar sources. This is often necessary as a post-processing step when running algorithms that operate on overlapping "padded" blocks.

Rather than use an all-to-all comparison as before, it now identifies the top k nearest neighbors for each region, and only considers merges for those nearest neighbors, so runtime goes with n*k instead of n*n where typically n>>k.

Speeds for the two versions below.

# regions	old	new
50	0.53s	0.12s
100	2.3s	0.28s
200	10.4s	0.65s
400	44.8s	1.47s

Also confirmed identical output on real-world examples with k=10 (the default).

thunder-project / thunder-extraction

precompute distances for faster merging #5