yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
121 stars 41 forks source link

get_short_paths: intersect good_samples with samples_to_check using std::unordered_set #244

Closed AngieHinrichs closed 2 years ago

AngieHinrichs commented 2 years ago

get_short_paths needs to return the intersection of good_samples with samples_to_check. This PR changes good_samples into an unordered set instead of vector, and adds a version of sample_intersect that takes an unordered set instead of vector, so the intersection is O(N)O(constant) instead of O(N)O(M) which is impractical when both N and M are ~10M.

This fixes the specific problem with --max-path-length in #243, but extract.cpp calls sample_intersect in several places with two vectors as input; unless one of the vectors is always very small, it would be a good idea to use std::unordered_set instead of std::vector for at least one of the sets (ideally the larger set; and it wouldn't hurt to use unordered set in all cases).