Open tseemann opened 2 years ago
updown
will do what you want, and it will be faster (especially if you generate the csv-format input with updown list
first) for SARS-CoV-2 at the moment. D (--dist-all) for updown
is an int which is number of SNPs.
But I can't see any reason not to extend this functionality to closest
too. As it stands I think D for closest would be a float which will be differences per site.
Adding it to cloest
would be great, as updown
was not clear what it did at first, and the interface for cleoserst
is much easier. tog et stsrted on, and as you said, the logic is already there. Thank you!
Still to do: optionally don't truncate the output (by first tiebreaking by genome completeness).
Thank you for writing
gofasta
- it has some of things i wanted to implement, plus more.The
closest
command seems to be able to find the single closest per query, or the N closest.Could there be an option to give all sequences within distance D ?
Also, if there are C equally good matches, it tiebreaks by completeness. Could this be optionally able to provide all of the matches? In COVID we often have many identical sequences geographically spread so want all those matches.
Or should
updown
be used for this?