yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
122 stars 41 forks source link

Add --within-distance to matUtils extract (output all samples within k mutations of target samples) #251

Closed amkram closed 2 years ago

amkram commented 2 years ago

This adds two options to matUtils extract: --within-distance [filename] and --distance-threshold [int] which are used to write a TSV file listing all sample IDs within distance-threshold of the selected samples.

E.g. matUtils extract --within-distance output.tsv --distance-threshold 4 -s input_samples.tsv -i tree.pb.gz

The first column in the TSV output contains one sample from the input set per line, and the second column is a comma-separated list of all samples within the specified threshold of that line's sample.

This also fixes a small bug in the logic for extract --closest-relatives