sanger-pathogens / snp-sites

Finds SNP sites from a multi-FASTA alignment file
http://sanger-pathogens.github.io/snp-sites/
Other
233 stars 50 forks source link

specify reference sequence #44

Closed george-githinji closed 8 years ago

george-githinji commented 8 years ago

How can someone specify that a given sequence is the reference and comparisons should be relative to the given sequence? Does the tool assume that the first sequence in the alignment is the reference sequence?

tseemann commented 8 years ago

Yes, the first sequence in the multi FASTA is assumed to be the "reference". I don't think it matters which sequence is the "reference" as you should get the same result output file even if you re-order the sequences. But maybe @andrewjpage can confirm that!

andrewjpage commented 8 years ago

The software generates its own internal reference. The 'reference' can be assumed to be the first sequence, however this is only of interest if you are using the VCF output, where it populates the REF column. For all other outputs the order of the sequences doesnt make any difference. If there is missing data in the first sequence, it looks to the next sequence to try and populate those bases.