thunder-project / thunder-extraction

algorithms for feature extraction from spatio-temporal data
MIT License
6 stars 4 forks source link

precompute distances for faster merging #5

Closed freeman-lab closed 8 years ago

freeman-lab commented 8 years ago

This PR speeds up the merging operation, which combines similar sources. This is often necessary as a post-processing step when running algorithms that operate on overlapping "padded" blocks.

Rather than use an all-to-all comparison as before, it now identifies the top k nearest neighbors for each region, and only considers merges for those nearest neighbors, so runtime goes with n*k instead of n*n where typically n>>k.

Speeds for the two versions below.

# regions old new
50 0.53s 0.12s
100 2.3s 0.28s
200 10.4s 0.65s
400 44.8s 1.47s

Also confirmed identical output on real-world examples with k=10 (the default).