sanger-pathogens / snp-sites

Finds SNP sites from a multi-FASTA alignment file
http://sanger-pathogens.github.io/snp-sites/
Other
232 stars 50 forks source link

How to manage heterozygosity in SNP conversion? #98

Open Pryfed opened 3 years ago

Pryfed commented 3 years ago

Hello,

Sorry for this (I guess) basic question, but I did not find the answer in the README.md file nor in the paper (Page et al. 2016).

I try to convert FASTA alignments into a SNP-extracted VCF format for downstream analyses. Some alignments are for nuclear markers, and I work on a polyploid organism, so I sometimes have more than 2 haplotypes for a given individual, but all are properly phased.

My FASTA input is formated as follow:

Individual1_a Allele-a-sequence Individual1_b Allele-b-sequence Individual2_a Allele-a-sequence Individual2_b Allele-b-sequence Individual2_c Allele-c-sequence ...

I used a basic command:

snp-sites -v -o out.vcf in.fas

And I indeed got a .vcf file. But in this file, each allele seems coded as a homozygous individual, I see no 0/0/1 or even 0/1 in the output as expected, but rather only 0, 1 and 2 (like haploid calls).

How could I get an output so that phasing information and heterozygosity are considered? Is there an option in snp-sites that I missed? Or do I have to adapt my input, and how? (Like, loosing the phasing information by merging the alleles, getting only 1 sequence per individual but with ambiguities?! Is that mandatory?)

Thank you for any answer.