sanger-pathogens / snp-sites

Finds SNP sites from a multi-FASTA alignment file
http://sanger-pathogens.github.io/snp-sites/
Other
233 stars 50 forks source link

Output invariant sites and nucleotide frequencies #62

Open EpiDemos82 opened 6 years ago

EpiDemos82 commented 6 years ago

In general, phylogenetic programs use invariant sites for likelihood calculations. However, a number of programs, such as RAxML and BEAST, can perform ascertainment bias corrections given the number of invariant sites and the frequencies of nucleotides in the alignment. If SNP-sites output these values, they could be used as direct inputs for RAxML, for example.

tseemann commented 6 years ago

I second this suggestion.

Either add a -s (stats?) option to report all sorts of columnar statistics, characters used etc.

OR

Always output this to stderr as part of the logs.

andrewjpage commented 6 years ago

Grand, give me a toy example and I'll sort it out

On 27 September 2017 at 08:53, Torsten Seemann notifications@github.com wrote:

I second this suggestion.

Either add a -s (stats?) option to report all sorts of columnar statistics, characters used etc.

OR

Always output this to stderr as part of the logs.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sanger-pathogens/snp-sites/issues/62#issuecomment-332439973, or mute the thread https://github.com/notifications/unsubscribe-auth/AABeVy1rlA4xOP4bxeVn-EM0LNPbtoOhks5smf7igaJpZM4PhZS0 .

EpiDemos82 commented 6 years ago

From your example in the README:

sample1 AGACACAGTCAC sample2 AGACAC----AC sample3 AAACGCATTCAN

-s (or stderr) would produce:

Input stats: Alignment length: 12 Proportion Ns: 0.03 Proportion Gap sites: 0.11 Nucleotide frequencies (A,G,C,T): 45.2,12.9,32.3,9.7

Output stats: SNP alignment length: 3 Number Gap sites (-) introduced: 1 Proportion gap sites: 0.11

Thinking about this more, I obviously came up with a couple other useful stats. seqtk comp can produce similar stats, but having them in one tool with the speed of SNP-sites would be great.

tseemann commented 6 years ago

But output in a machine readable format so we can parse or JSON-ate.

slvrshot commented 3 years ago

@andrewjpage

Hello. I was curious if this was ever implemented?