theiagen / public_health_viral_genomics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of viral pathogens of concern, especially SARS-CoV-2
https://public-health-viral-genomics-theiagen.readthedocs.io/
GNU Affero General Public License v3.0
41 stars 17 forks source link

Issue with Medaka SC2 frameshifts and GISAID/GenBank rejections #136

Open kevinlibuit opened 2 years ago

kapsakcj commented 1 year ago

Wow this one is long overdue to address since it initially cropped up in April...

I'll document the proposed solution that has resolved false frameshifts in consensus assemblies generated by TheiaCov_ONT (v2.2.0) across at least one GridION seq run with 52 samples affected.

Re-assemble genomes using existing TheiaCov_ONT workflow w/ updated medaka model: r941_min_hac_variant_g507 ← model used by epi2me/wf-artic nextflow workflow r941_min_hac_g507 ← model we advised to use previously

According to ONT:

The _variant family is the set which is appropriate for the purposes of calling variants from reads aligned to a reference sequence. The non-_variant models are used for polishing draft assemblies and are not intended for calling variants. This is why wf-artic restricts the choice to the _variant family.

Currently, this solution will only apply to TheiaCov_ONT users that are using Nanopore sequencing (i.e. non-ClearLabs)

This does not include TheiaCov_ClearLabs/ClearLabs platform users because the Guppy version utilized in the platform is older than Guppy 5.0.7, so this medaka model will not be appropriate. CL users should continue to utilize the assemblies generated by the CL BIP workflow