kylebittinger / brocc

Consensus taxonomy assignment for short reads (great for fungi)
GNU General Public License v3.0
8 stars 5 forks source link

Fix voting issues surrounding generic taxa #10

Closed kylebittinger closed 6 years ago

kylebittinger commented 6 years ago

Upon inspecting some assignments, we realized that the handling of generic taxa (e.g. "uncultured fungi") needed to be simplified. In particular, we hoped to verify that generic taxa were not included in the vote totals when deciding on a taxonomic assignment. In the process of fixing this, I made several substantial changes to the software:

  1. Identification of generic taxa is much improved. As implemented, any taxon descended from "environmental samples" or "unclassified _____" is marked as generic. By inspection, I have not seen any counter-examples in the NCBI taxonomy.
  2. A detailed voting log is included in the output directory. If an assignment was not made at a given rank, the voting log indicates the reason.
  3. By default, a candidate needs to have 4 BLAST hits for an assignment to be made. Before, the fraction of generic taxa was used to determine if enough candidates were present for voting. Under this old system, some assignments were based on one or two BLAST hits, which I thought was dangerous. We eliminated the old parameter for fraction of generic taxa and introduced a new parameter to control the minimum number of votes needed for the leading candidate to be selected as the assignment. Note that the leading candidate also must have enough votes to achieve the required consensus value (60% of votes for species, by default).
kylebittinger commented 6 years ago

Requesting a review and comments from @ctanes