tseemann / snp-dists

Pairwise SNP distance matrix from a FASTA sequence alignment
GNU General Public License v3.0
127 stars 28 forks source link

Include ambiguous bases but not Ns #43

Open GonzaloYebra opened 3 years ago

GonzaloYebra commented 3 years ago

Hi! I know there's been a few similar issues raised here but they didn't match quite exactly what I'm looking for...

My question is, would there be any way to run snp-dists including ambiguous bases in the calculation while disregarding Ns? Basically a hybrid between the default and the -a options.

In my case, I wouldn't mind what specific ambiguous base is found, I'd like to count them all as different to ATCG.

Any ideas?

Thanks a lot!

Gonzalo

tseemann commented 3 years ago

@GonzaloYebra Normally a match is +1 and a mismatch is 0. How do you want the ambiguous IUPAC codes measured?

A vs R = 0 C vs R = 0 A vs N = 0 R vs R = 1 <---- ? N vs R = 0

Is that correct?