Open veghp opened 4 years ago
A current workaround is to replace all characters (ATCG
...) in one of the strings to another set of characters (#@;&
...) and define penalties between the two sets of characters (alphabets) -- at the cost of halving the number of allowed characters.
Thank you for this great package that helps me in comparing short sequences (https://github.com/Edinburgh-Genome-Foundry/Examples/tree/master/SeqDistance).
I'm wondering if it would possible to add a feature: self-substitution costs. Currently the diagonal of the substitution matrix seems to be ignored.
To expand on this a bit, we use some characters to encode multiple characters (e.g. S = C or G), that is, to encode uncertainty. In this case the chance that two Ss encode the same letter is 50%, so the penalty score should be 0.5.