thibautjombart / adegenet

adegenet: a R package for the multivariate analysis of genetic markers
168 stars 64 forks source link

DNAbin2genind and ambiguity codes from Fasta #336

Open burbrink opened 2 years ago

burbrink commented 2 years ago

Hello,

It seems that when I read fasta files with ambiguity codes for heterozygous states (e.g., w = A/T) using DNAbin2genind this is coded as NA in the genind object. Unfortunately, other functions doing this at the genome scale that also rely on this (e.g., multidna2genind) also pass this as NA to the genind object. Do you know of any fix for this?

Thanks so much!

Frank

zkamvar commented 2 years ago

Hello,

I'm not sure why, but DNAbin2genind always assumed haploid sequences for FASTA, which you can see here:

https://github.com/thibautjombart/adegenet/blob/78be588d418f8e5b0a05ebc2880917b1c6581054/R/sequences.R#L65

I believe that modifying DNAbin2genind with a flag that allows for heterozygous sites is possible. Would you like to make a pull request for this functionality?