Closed Sn0flingan closed 6 years ago
Ambigous bases (and amino-acids) are supported (see the matrix below), but matches to N and X get negative scores, so that long runs against unknown bases are discouraged. One could certainly make the argument that those :N/:X values should be 0, not -1, while N:N/X:X should be -1, but those values can cause problems with the statistics when there are long runs of NNN.
You can alter the scoring matrix by taking a matrix in the data/ directory and editing it to suit your needs.
Bill Pearson
A C G T U R Y M W S K D H V B N X A 5 -4 -4 -4 -4 2 -1 2 2 -1 -1 1 1 1 -2 -1 -1 C -4 5 -4 -4 -4 -1 2 2 -1 2 -1 -2 1 1 1 -1 -1 G -4 -4 5 -4 -4 2 -1 -1 -1 2 2 1 -2 1 1 -1 -1 T -4 -4 -4 5 5 -1 2 -1 2 -1 2 1 1 -2 1 -1 -1 U -4 -4 -4 5 5 -1 2 -1 2 -1 2 1 1 -2 1 -1 -1 R 2 -1 2 -1 -1 2 -2 -1 1 1 1 1 -1 1 -1 -1 -1 Y -1 2 -1 2 2 -2 2 -1 1 1 1 -1 1 -1 1 -1 -1 M 2 2 -1 -1 -1 -1 -1 2 1 1 -1 -1 1 1 -1 -1 -1 W 2 -1 -1 2 2 1 1 1 2 -1 1 1 1 -1 -1 -1 -1 S -1 2 2 -1 -1 1 1 1 -1 2 1 -1 -1 1 1 -1 -1 K -1 -1 2 2 2 1 1 -1 1 1 2 1 -1 -1 1 -1 -1 D 1 -2 1 1 1 1 -1 -1 1 -1 1 1 -1 -1 -1 -1 -1 H 1 1 -2 1 1 -1 1 1 1 -1 -1 -1 1 -1 -1 -1 -1 V 1 1 1 -2 -2 1 -1 1 -1 1 -1 -1 -1 1 -1 -1 -1 B -2 1 1 1 1 -1 1 -1 -1 1 1 -1 -1 -1 1 -1 -1 N -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
Are ambiguous base supported?
It seems like the programs accept "X" as an ambiguous character and makes good alignments (at least with glsearch). But for scoring, it seems like a match with X and A is considered a mismatch. I am worried that his might lead to errors other sequences where a true mismatch (e.g. A with G) is considered just as good/bad as X matched with A.
(See file for example alignment) ambiguous_missmatched.txt