theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
33 stars 15 forks source link

[Snippy_Streamline] Use gbff downloaded by Assembly_Fetch as the reference, if `include_gbff` = true #352

Open emmadoughty opened 4 months ago

emmadoughty commented 4 months ago

:cool:

:pushpin: Explain the Request

Currently, the include gbff and include gff optional inputs are exposed to users in the Snippy_Streamline workflow. These files are not output to the data table (only the cloud storage). By using the gbff (gbk) file as the reference, the SNPs that are identified in the workflow will be annotated with the gbff information (CDS name etc).

:books: Context

Knowing the CDS of the SNP is valuable for understanding where in the genome SNPs are occurring. This complements the shared SNPs task (https://github.com/theiagen/public_health_bioinformatics/pull/291) for identifying the shared SNPs. Currently, a user would only know the context of those SNPs if they input their own reference rather than using ReferenceSeeker.

:chart_with_upwards_trend: Desired Behavior

:information_source: Additional Information