nhoffman / bioy

Tools for NGS sequence analysis and bacterial classification
GNU General Public License v3.0
0 stars 0 forks source link

best N hits #52

Closed tyleraland closed 8 years ago

tyleraland commented 8 years ago

New argument --best N keeps only the best N hits for each query sequence, as ranked by absolute number of mismatches from the query.

I've been comparing the before and after (with and without --best 2) for a particular data set generated by the capture pipeline (classifying individual query sequences) and the results appear pretty promising. You can compare the before and after classifications here:

/mnt/disk2/molmicro/working/tland9/2016-02-25_bioy_slashname_filtering/classifications-nobest /mnt/disk2/molmicro/working/tland9/2016-02-25_bioy_slashname_filtering/classifications-best2

Potential Feedback:

nhoffman commented 8 years ago

Can we name the argument --best-n-hits ?

crosenth commented 8 years ago

Don't forget to update the CHANGLOG.rst