get_short_paths needs to return the intersection of good_samples with samples_to_check. This PR changes good_samples into an unordered set instead of vector, and adds a version of sample_intersect that takes an unordered set instead of vector, so the intersection is O(N)O(constant) instead of O(N)O(M) which is impractical when both N and M are ~10M.
This fixes the specific problem with --max-path-length in #243, but extract.cpp calls sample_intersect in several places with two vectors as input; unless one of the vectors is always very small, it would be a good idea to use std::unordered_set instead of std::vector for at least one of the sets (ideally the larger set; and it wouldn't hurt to use unordered set in all cases).
get_short_paths
needs to return the intersection ofgood_samples
withsamples_to_check
. This PR changesgood_samples
into an unordered set instead of vector, and adds a version ofsample_intersect
that takes an unordered set instead of vector, so the intersection is O(N)O(constant) instead of O(N)O(M) which is impractical when both N and M are ~10M.This fixes the specific problem with
--max-path-length
in #243, but extract.cpp callssample_intersect
in several places with two vectors as input; unless one of the vectors is always very small, it would be a good idea to usestd::unordered_set
instead ofstd::vector
for at least one of the sets (ideally the larger set; and it wouldn't hurt to use unordered set in all cases).