SNP calling - Githubissues

peflanag commented 4 years ago

Hi,

This is more a question than an issue.

I use Nullarbor on Staph and C diff samples. Once I perform the the WGS and use FastP on the raw fastq i run them through Nullarbor. However, the samples I am testing are clinical isolates and are not always of the same MLST type. It's my understanding that when carrying out analysis that the reference strain should be as identical as possible. So, ST22 isolates should be compared against an ST22 reference strain. But until I run Nullarbor I don't know what the MLST of the isolates are. Therefore I was running the MLST program separate on the isolates and then grouping them accordingly to run Nullarbor on them.

My question is, does it matter what reference I use? Is Nullarbor doing wgMLST or cgMLST. My understanding is if it is using cgMLST then the MLST type of the reference isn't as important because cgMLST looks at non repetitive regions and only the conserved regions.

Also, if I only analysis samples grouped by MLST type then any tree will just be that MLST type. is it possible to merge the data to illustrate multiple MLST types on the one tree if the MLST types are mapped to their specific MLST type reference?

Cheers,

Peter

andersgs commented 4 years ago

Hi @peflanag under the hood Nullarbor uses Snippy to map sample reads to the reference to identify SNPs between the reference and sample, and then combines across samples to identify core SNPs. So, the reference is very important.

We have just posted a manuscript in Biorxiv that sorts through some of the important parameters to consider, including the reference: https://www.biorxiv.org/content/10.1101/2020.09.24.310821v1

peflanag commented 4 years ago

Cheers @andersgs

So going forward I should determine the ST using MLST and group accordingly before running Nullarbor with a reference of the same ST.

P

andersgs commented 4 years ago

You can also use make preview to quickly assess the diversity of your samples and identify groups of more closely related samples, and remove outlier samples.

Here is the link in the docs: https://github.com/tseemann/nullarbor#quick-preview-mode

tseemann / mlst

SNP calling #99