Please add in a percent_identity column into the best_classifications table.
Define the percent identity of two aligned sequences to be the percent of matching columns, excluding columns where one of the sequences is a gap.
For each classified pquery p, consider the collection of reference sequences that have the same classification as p. Compute the median percent identity of the pquery sequence with those reference sequences.
Check in with @nhoffman about this one-- it was his idea.
I know it might feel redundant with the MAP-MRCA thing, but there are differences that will be important in the paper we are writing up.
Please add in a
percent_identity
column into the best_classifications table.Define the percent identity of two aligned sequences to be the percent of matching columns, excluding columns where one of the sequences is a gap.
For each classified pquery p, consider the collection of reference sequences that have the same classification as p. Compute the median percent identity of the pquery sequence with those reference sequences.
Check in with @nhoffman about this one-- it was his idea.
I know it might feel redundant with the MAP-MRCA thing, but there are differences that will be important in the paper we are writing up.