Closed sam-baird closed 1 year ago
Hey Sam! Thanks for letting us know about this issue. We'll be taking a look in more detail on Monday, but I wanted to let you know we have seen this and are working on a resolution!
Hey Sam, thanks for raising this issue. You are correct, in v1.0.1 (and prior versions) core_genome
boolean is unset (but acts as false
) by default and our docs say the opposite. We will correct this in the docs.
But to get at the deeper issue - we will be updating the workflow to have core_genome
set to true by default and thus the snp_sites core genome alignment will be generated.
We're also planning to update the iqtree2 task so that modelfinder is automatically run (unless the user defines their own model). We previously had some logic to select a model based on the core_genome
input, but now it will be up to the user to define their own or allow modelfinder to do its thing.
core_genome
is set to truetasks/phylogenetic_inference/task_iqtree2.wdl
core_genome
and selecting model based on that boolean. Update task to simply use user-defined model or allow model finder to run.core_genome
as input to taskcore_genome
= true by default in snippy_tree workflow@sam-baird Thanks for raising this issue. We have merged the PR, so you can now use the updated Snippy_Streamline workflow by using the main
branch in Terra.
The core_genome
input is now set to true
by default and there is no default model for iqtree2, so if you would like to use a specific model, you will need to provide it as an optional input for iqtree2_model
input param. Otherwise iqtree2 will run its modelfinder & automatically choose a model for you.
These changes will be incorporated into the next version release, but we don't have a timeline for that just yet. It may be another few weeks before we release a new version.
Let us know if you have any questions!
Hello Theiagen Team,
I'm performing SNP and phylogenetic analysis for a set of bacterial samples on Terra using PHB v1.0.1. When I compared the core genome SNP matrix output from the kSNP3 workflow to the SNP matrix output from the Snippy_Streamline workflow, I was surprised to see that the pairwise SNP distances were significantly higher on average for Snippy_Streamline. I had assumed by default that a core genome alignment was used by snp-dists for Snippy_Streamline because the docs on Notion indicate that the default value for
core_genome
is true, and I had not explicitly set this attribute. When I reran the workflow withcore_genome
explicitly set to true, the SNP distances were much closer to the kSNP3 core genome SNP distances.Looking at wf_snippy_tree.wdl, in the case of not explicitly setting
core_genome
, the snp-sites task is skipped:https://github.com/theiagen/public_health_bioinformatics/blob/5a68417767bbb53f6b6a303e22c8092e4f8b4031/workflows/phylogenetics/wf_snippy_tree.wdl#L70-L74
Then the Gubbins polymorphic FASTA is used as the input instead to snp-dists (assuming
use_gubbins
is set to default of true):https://github.com/theiagen/public_health_bioinformatics/blob/5a68417767bbb53f6b6a303e22c8092e4f8b4031/workflows/phylogenetics/wf_snippy_tree.wdl#L104-L106
I'm not sure if the Gubbins polymorphic FASTA is based on a pan genome alignment rather than a core genome alignment, but it looks like a pan genome alignment since there are gaps (
-
) in the alignment FASTA. It seems like the default behavior should be to use the core genome alignment from snp-sites by default