nhoffman / bioy

Tools for NGS sequence analysis and bacterial classification
GNU General Public License v3.0
0 stars 0 forks source link

unknown tax_ids in data/rank_thresholds.csv cause results with those tax_ids to be filtered out #32

Closed crosenth closed 9 years ago

crosenth commented 9 years ago

example of this is when you blast our hm78 database against itself there are some [no blast result]'s. Those sequences with [no blast result] have tax_ids not in the data/rank_thresholds.csv.

Possible solutions:

1) Create rank_thresholds on the fly based on the data/rank_thresholds_defaults.csv. 2) Enforce reference databases to create their own rank_thresholds.csv file. 3) Have some default if a tax_id is not found 4) Recreate the default threshold table with the latest ncbi tax_ids. 5) combination of 2, 3 and 4

crosenth commented 9 years ago

Did 4) and 3) but dynamically moving up ranks through the taxtable. If the highest rank (presumably root) is still not a match then that blast hit is filtered out.