wrpearson / fasta36

Git repository for FASTA36 sequence comparison software
Apache License 2.0
117 stars 17 forks source link

Ambiguous bases are they supported? #12

Closed Sn0flingan closed 6 years ago

Sn0flingan commented 6 years ago

Are ambiguous base supported?

It seems like the programs accept "X" as an ambiguous character and makes good alignments (at least with glsearch). But for scoring, it seems like a match with X and A is considered a mismatch. I am worried that his might lead to errors other sequences where a true mismatch (e.g. A with G) is considered just as good/bad as X matched with A.

(See file for example alignment) ambiguous_missmatched.txt

wrpearson commented 6 years ago

Ambigous bases (and amino-acids) are supported (see the matrix below), but matches to N and X get negative scores, so that long runs against unknown bases are discouraged. One could certainly make the argument that those :N/:X values should be 0, not -1, while N:N/X:X should be -1, but those values can cause problems with the statistics when there are long runs of NNN.

You can alter the scoring matrix by taking a matrix in the data/ directory and editing it to suit your needs.

Bill Pearson

Sample dna matrix

A C G T U R Y M W S K D H V B N X A 5 -4 -4 -4 -4 2 -1 2 2 -1 -1 1 1 1 -2 -1 -1 C -4 5 -4 -4 -4 -1 2 2 -1 2 -1 -2 1 1 1 -1 -1 G -4 -4 5 -4 -4 2 -1 -1 -1 2 2 1 -2 1 1 -1 -1 T -4 -4 -4 5 5 -1 2 -1 2 -1 2 1 1 -2 1 -1 -1 U -4 -4 -4 5 5 -1 2 -1 2 -1 2 1 1 -2 1 -1 -1 R 2 -1 2 -1 -1 2 -2 -1 1 1 1 1 -1 1 -1 -1 -1 Y -1 2 -1 2 2 -2 2 -1 1 1 1 -1 1 -1 1 -1 -1 M 2 2 -1 -1 -1 -1 -1 2 1 1 -1 -1 1 1 -1 -1 -1 W 2 -1 -1 2 2 1 1 1 2 -1 1 1 1 -1 -1 -1 -1 S -1 2 2 -1 -1 1 1 1 -1 2 1 -1 -1 1 1 -1 -1 K -1 -1 2 2 2 1 1 -1 1 1 2 1 -1 -1 1 -1 -1 D 1 -2 1 1 1 1 -1 -1 1 -1 1 1 -1 -1 -1 -1 -1 H 1 1 -2 1 1 -1 1 1 1 -1 -1 -1 1 -1 -1 -1 -1 V 1 1 1 -2 -2 1 -1 1 -1 1 -1 -1 -1 1 -1 -1 -1 B -2 1 1 1 1 -1 1 -1 -1 1 1 -1 -1 -1 1 -1 -1 N -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1