sanger-pathogens / snp-sites

Finds SNP sites from a multi-FASTA alignment file
http://sanger-pathogens.github.io/snp-sites/
Other
233 stars 50 forks source link

Option to disallow gap and/or N and/or non-AGTC ? #28

Closed tseemann closed 4 years ago

tseemann commented 8 years ago

We are impressed by the speed of this tool (due to being C code).

A very useful feature we need to the ability to also filter out things like:

These would need to be independent options.

Ideally the current default behaviour to remove conserved (monmorphic) sites could also be an option. eg. so we could remove all columns with a gap only and leave the rest.

ONeillMB1 commented 8 years ago

Great tool! We also are very impressed with the speed!

Building off the recommendation of @tseemann, it would also be nice to be able to impose thresholds for tolerable amounts of missing data. For example, if 75% of samples in the alignment have data and there exists a SNP among them, retain the data in the SNP alignment.

andrewjpage commented 8 years ago

I've added in a 'pure' mode as discussed with @tseemann and a 'keep monomorphic' mode (so it will work with BEAST).

jamiethompson77 commented 4 years ago

This does work with amino acids though right?

tseemann commented 4 years ago

@jamiethompson77 i think AGTC is quite hard-coded into the tool, sorry. snp is in the name and that is DNA specific.