Multiple frameshifts in complete assembled bacterial genomes

rrwick / Unicycler

hybrid assembly pipeline for bacterial genomes

GNU General Public License v3.0

536 stars 132 forks source link

I have assembled few bacterial complete genomes using nanopore long reads via Unicycler bold module followed by multiple rounds of polishing by pilon using Illumina Miseq reads. While submitting them to NCBI, multiple pseudogenes (more than 10%) were detected by them while annotating our genomes. For which they have told it may be due to multiple frameshifts from insertions or deletions in the genome sequence. Moreover, I have also encountered such an issue as in, while doing SNP calling in case of draft genomes (obtained using Illumina reads only which were Spades assembled) SNPs were in the range of 100-200, while using complete genomes we are getting SNPs in the range of 3000. I am anticipating either there is some issue with assembly or nanopore sequencing. While nanopore sequencing output was 4 GB in 20 hrs run and basecalling was done using albacore. Kindly guide me where I am going wrong and getting so many frameshifts in the assembly.

rrwick / Unicycler

Multiple frameshifts in complete assembled bacterial genomes #115