matsen / pplacer

Phylogenetic placement and downstream analysis
http://matsen.fredhutch.org/pplacer/
GNU General Public License v3.0
74 stars 18 forks source link

percent identity for best_classifications table #184

Closed matsen closed 12 years ago

matsen commented 12 years ago

Please add in a percent_identity column into the best_classifications table.

Define the percent identity of two aligned sequences to be the percent of matching columns, excluding columns where one of the sequences is a gap.

For each classified pquery p, consider the collection of reference sequences that have the same classification as p. Compute the median percent identity of the pquery sequence with those reference sequences.

Check in with @nhoffman about this one-- it was his idea.

I know it might feel redundant with the MAP-MRCA thing, but there are differences that will be important in the paper we are writing up.