ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
177 stars 63 forks source link

Invalid Seed and Subsampled fraction: 0.00 % #121

Closed shearingham closed 4 years ago

shearingham commented 4 years ago

Hello!

I want to use NOVOPlasty to assemble a invertebrate mitogenome from Illumina RNA-Seq data, but i can't get it to run properly. I'm working with Ubuntu 18.04 on a Linux-Subsystem on Win 10 Pro on 6 cores/12 threads@4 GHz and 32 GB of Ram and Perl 5.26.1. The log-file is as following:

-----------------------------------------------
NOVOPlasty: The Organelle Assembler
Version 3.8.1
Author: Nicolas Dierckxsens, (c) 2015-2019
-----------------------------------------------

Input parameters from the configuration file:   *** Verify if everything is correct ***

Project:
----------------------
Project name          = TEST_NEW
Type                  = mito
Genome range          = 13000-15000
K-mer                 = 39
Max memory            = 30
Extended log          = 0
Save assembled reads  = no
Seed Input            = /mnt/d/Uni/Berni_Bakkarbeit/COX_1.fasta
Extend seed directly  = no
Reference sequence    = 
Variance detection    = 
Chloroplast sequence  = 

Dataset 1:
----------------------
Read Length           = 248
Insert size           = 326
Platform              = illumina
Single/Paired         = PE
Combined reads        = 
Forward reads         = /mnt/d/Uni/Berni_Bakkarbeit/FASTQ_DATA/SRR4039023_1.fastq
Reverse reads         = /mnt/d/Uni/Berni_Bakkarbeit/FASTQ_DATA/SRR4039023_2.fastq

Heteroplasmy:
-----------------------
Heteroplasmy          = 
HP exclude list       = 
PCR-free              = 

Optional:
----------------------
Insert size auto      = yes
Insert range          = 1.9
Insert range strict   = 1.3
Use Quality Scores    = 

Subsampled fraction: 0.00 %

Retrieve Seed...

INVALID SEED, PLEASE TRY AGAIN WITH A NEW ONE

My input data looks like this:

@SRR4039023.1 1/1
ATTGTGTCCAATTGCGAACTTTCTAATGTGGGTCTAGTATTCAATCCATTCATTTCATTTTGTTCTGCGGTTATATTAGTGAGCGAACCGAGTTCTATATTGATTAAGTGCGGACGCGTCACTTCAACAGAAACACTTGGCAACCAACTCGCATATCGCGGGCCATCAAAATGGAAAAAGTGGTTCGTTCTGCCGGCGTTAGCCGTTCCTCTATTATTATTATCACGAGTNTTTCTCTCTGTATTACTTTC
+
AACCBFFFFFFFGGGGGGGGGGHHGHHHGHGGGHHHHHHHHHHHHHHHHHGHHHHHHHIIHIHHHHHHGGGGGHHGHGHGHHHGGGGGGGFGFFHHHHHHHH4GHHHHHHHGGGGGGGGGGGHHHHHGHHGHHHHHHHHGGHHHHGHGHHGGGGGHHGGGGGGGGHHHGHBGFFFGGGGGGGGGGGGGGGGFGGGGFFFFFFFFFFFFFFFFFFFFFFFEFFFFFFFFF:#;;A9EFFFFFFFFFFFFFF0
@SRR4039023.2 2/1
AAACTGATAAAATGTTTTCGACAAAGATGTTGTTCCTCACTTTTCTCGGAGTTTTAGTTATTATATCAATGACTGATAACTCAGCATCGGCTGATTACTTGTCTGGTTGTTTTAAGGGTCCGTGTTATTCGGACACCAACTGTAATGGAGTTTGTAAGGGTTGCGACAATAATCCAAAAGGCGGTA
+
CCDDCFFFFFFFGGGGGGGGGGGGHHHHHHHHHHHHHHHHHHHHHHHGGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHFHHHHGGGGGHHHHHHHHHHHHHHHGGHHHHHHHHGHHGHGHHHHHHHHGGGHGHHGHHHHHHHHGHHHHHHHHHHHHGCHFGGGGHHHHHHHHHHHFHGGGG
@SRR4039023.3 3/1
GCGCAATATTTGAATTCTTATCAGCTGTTGGTATAAAAACCCATTACATTAGTCGCGTTAAAAACAATGAAAGCGCATTTGTAGCGCAGAAGTGTTCAATGGTGCCAATAGAATGGGTTGCGAGACGAGTGGCCACCGGTTCTTACTTAAAACGCAATCCTCATGTTAATGAAGGATACCGATTCTCGCCACCACTTTTGGAAACTTTTTTCAAAGATGACGCCAATCATNATCCTTATTGGTCGATACCT

and my seed is the cox1 gene like this:

>NC_011574.1:1-1534 Steganacarus magnus mitochondrion, complete genome
ATGCGATGGTTTTATTCAACTAATCATAAAGATATCGGAAGTTTGTATTTAATGTTTGGTGTATGAGCGG
GTATTTTAGGGTCTTCTTTAAGTTCTTTAATTCGTTTAGAACTTGGACAAATTGGTTCAATTTTATCAAG
AGATCAAATTTATAATGTTTCTGTAACTTCTCATGCTTTTATTATAATCTTTTTTACTGTTATACCTATT
ATAATTGGAGGTTTTGGAAATTGGATAATCCCTTTAATAATTGGATCTCAGGATATAGCTTTCCCACGAA
TGAATAATATAAGTTTTTGGTTTCTTCCTCCTTCTTTATTGTTTCTTATATCTTCTTCTTTTTGTGGACA
AGGAGTGGGTACAGGTTGAACAGTTTATCCTCCCTTGTCTAATATTTTATTTCATTCAGGTTATTCAGTA
GATTTGTCAATTTTTAGTCTTCACATAGCTGGGGCTTCTTCTATTTTAGGGGCTATTAATTTTATCACTA
CAATTTTAAATATGAAATCTAGGTTTTTAAGTTATGATTGTATACCTTTATTGATTTGATCTATTTTTGT
AACTGCTATTCTTTTGTTACTTTCTCTTCCTGTATTAGCAGGAGCAATCACAATGCTTTTAATTGATCGA
AATTTTAATACTTCATTTTTTGATCCTTCAGGTGGAGGTGACCCCATCTTATTTCAACATTTATTTTGGT
TTTTTGGTCATCCTGAGGTTTATATTTTAATTTTACCGGGTTTTGGAATTGTTTCTCATACAATTTCTTA
TTATTCTGGAAAAGAAACTCCTTTTGGAAGATTGGGTATAATTTATGCAATGGTTTCAATTGGGTTTTTA
GGATTTATTGTATGAGCTCATCATATGTTTACAATTGGAATAGATATTGATTCTCGGGCTTATTTTACAG
CTGCTACAATGGTAATTGCTGTTCCTACTGGGGTAAAAGTTTTTAGTTGGGTTGCTACTATCTTGGGTTC
TAGATTTTCTATAGATGTCCCATTATATTGGACTTTAGGGTTTATTTTTCTTTTTACAATAGGGGGATTA
ACTGGTATTATTCTTTCTAATTCTTCTCTTGATATTTCTCTTCATGATACATATTATGTTGTTGCTCATT
TTCATTATGTTCTTTCAATGGGGGCTATTTTTGCTATTATAGCTGGAATTTTTCATTGGCTTCCAGTTAT
GTATAATATTTCTTTTAACCCTAAAATTTTAAAAGTTCAGTTTTATTCTATATTTGTAGGAGTTAATATA
ACTTTCTTTCCTCAACATTTTCTTGGATTAAACGGAATACCTCGACGGTATTCTGATTATCCTGATGCTT
TCACTTATTGAAATTTGGTTTCTTCTATTGGTTCTTATATTTCGGAAATTTCGATTATTCTTTTAATTTG
GGTTTTTTGGGAAAGAATATCATCTAGTCGGGAAAAATTAAGATATTTTTTTCTTGTTTCTGGAATTGAA
TGGATAAATATGTATCCTGTAGAAGAACATACATTCAATCAATCTTGTTTTTTAATAAAATTTT

The weird thing is, that in the terminal while running it shows several hundreds of lines with @LINE before saying that the Subsampled fraction is 0%.

image

Do you have any idea where I make my mistake?

Best regards, Bernhard

UPDATE:

I managed to get it to run, by converting my fastq-files into fasta. I'm currently in the process of finding the right k-mer size, etc.

ndierckx commented 4 years ago

I think the read ids are not compatible, I will look at it today and upload a new version

ndierckx commented 4 years ago

Thanks for the all info already, but could you also send the first 3 reads of your reverse file?

shearingham commented 4 years ago

Of course, here are the reverse reads:

@SRR4039023.1 1/2
GAAGAAAGTAGAGAACAAATTGAAAGTAATACAGAGAGAAATACTCGTGATAATAATAATAGAGGAACGGCTAACGCCGGCAGAACGAACCACTTTTTCCATTTTGATGGCCCGCGATATGCGAGTTGGTTGCCAAGTGTTTCTGTTGAAGTGACGCGTCCGCACTTAATCAATATAGAACTCGGTTCGCTCACTAATATAACCGCAGAACAAAATGAAATGAATGGATTGAATACTAGACCCACATT
+
AAAAAFFFFFFFGGGGGGGGGGFGHHGHGHHHHHHHHGGHHHGHHHGHHHHHHHHHHHHHHHHHGHHHGGGGGHHGGGGGGGGGHHGGGGGHGHHHHHGHHHHHHHGHHHGFHGGGGGGFHFGGGCFHHCHGHGHHEHHHHHHHGHHGH1FGHHGGGCDGCFGCGGHHHHHHBBFFFGGGGEFDGFDGAAAFGGGFGEFGGFFFAFFFFFFFFFFFFBBFFFFFFFFFFFFFFFFFFFBFF/EEDFEF
@SRR4039023.2 2/2
TACCGCCTTTTGGATTATTGTCGCAACCCTTACAAACTCCATTACAGTTGGTGTCCGAATAACACGGACCCTTAAAACAACCAGACAAGTAATCAGCCGATGCTGAGTTATCAGTCATTGATATAATAACTAAAACTCCGAGAAAAGTGAGGAACAACATCTTTGTCGAAAACATTTTATCAGTTTAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAA
+
BCCCBCCCCFFFGGGGGGGGFGGHGGGHGHHHHHHHHHHHHHHHHHGGHHHGFGHHGGGGGHHHGGGGGGFHGHHHHHHHHGGHHGHHHFGHHHHHHHGGGGGHHHHFHHGHHHHHHHHHHGHHHHHHHHHHHHHHHHHGGGGGGHH?FFFHHHHHHHHHFHHHHGHDECEGHHHHHHHHHHFGHHFHHFHHGGGGHGEGCG?DG9FFGGGFEGGGFGFFFGG0FFGG9DFAFFFFF@DFFFFBBBFFBD
@SRR4039023.3 3/2
TATTATCAAGCCATCGATATCAAGTTTAGCTCCAATTATTGTAGGTATCGACCAATAAGGATCATGATTGGCGTCATCTTTGAAAAAAGTTTCCAAAAGTGGTGGCGAGAATCGGTATCCTTCATTAACATGAGGATTGCGTTTTAAGTAAGAACCGGTGGCCACTCGTCTCGCAACCCATTCTATTGGCACCATTGAACACTTCTGCGCTACAAATGCGCTTTCATTGTTTTTAACGCGACTAATGTAAT

Just for info, these are the raw reads, downloaded from the European Nucleotide Archive, without any adapter/quality trimming

ndierckx commented 4 years ago

these reads should work, did you do some trimming on them? Because I have noticed some software changes the spacing of the reads ids, which can cause problems

shearingham commented 4 years ago

I tried the fastq-file with and without trimming, neither did work. As i wrote in my update, the only way I managed to run the program was by converting the fastq files to fasta with seqtk. With fasta files it seemed to run without any issues.

ndierckx commented 4 years ago

I downloaded the dataset and it worked for me, so maybe it's a problem with your config file. Could you download it again? Or maybe it is a Windows problem, some text editors caused problems before, although that should have been resolved

shearingham commented 4 years ago

I will try downloading the config file again and get back to you. I will also try re-downloading the fastq-files from different sources and try downloading the sra-file and converting it to fastq with the sra-toolkit.

ndierckx commented 4 years ago

I downloaded them from https://www.ebi.ac.uk/, I have you send me the seed, I can run it for you to see if it assembles

shearingham commented 4 years ago

I used following gene as seed:

>NC_011574.1:1-1534 Steganacarus magnus mitochondrion, complete genome
ATGCGATGGTTTTATTCAACTAATCATAAAGATATCGGAAGTTTGTATTTAATGTTTGGTGTATGAGCGG
GTATTTTAGGGTCTTCTTTAAGTTCTTTAATTCGTTTAGAACTTGGACAAATTGGTTCAATTTTATCAAG
AGATCAAATTTATAATGTTTCTGTAACTTCTCATGCTTTTATTATAATCTTTTTTACTGTTATACCTATT
ATAATTGGAGGTTTTGGAAATTGGATAATCCCTTTAATAATTGGATCTCAGGATATAGCTTTCCCACGAA
TGAATAATATAAGTTTTTGGTTTCTTCCTCCTTCTTTATTGTTTCTTATATCTTCTTCTTTTTGTGGACA
AGGAGTGGGTACAGGTTGAACAGTTTATCCTCCCTTGTCTAATATTTTATTTCATTCAGGTTATTCAGTA
GATTTGTCAATTTTTAGTCTTCACATAGCTGGGGCTTCTTCTATTTTAGGGGCTATTAATTTTATCACTA
CAATTTTAAATATGAAATCTAGGTTTTTAAGTTATGATTGTATACCTTTATTGATTTGATCTATTTTTGT
AACTGCTATTCTTTTGTTACTTTCTCTTCCTGTATTAGCAGGAGCAATCACAATGCTTTTAATTGATCGA
AATTTTAATACTTCATTTTTTGATCCTTCAGGTGGAGGTGACCCCATCTTATTTCAACATTTATTTTGGT
TTTTTGGTCATCCTGAGGTTTATATTTTAATTTTACCGGGTTTTGGAATTGTTTCTCATACAATTTCTTA
TTATTCTGGAAAAGAAACTCCTTTTGGAAGATTGGGTATAATTTATGCAATGGTTTCAATTGGGTTTTTA
GGATTTATTGTATGAGCTCATCATATGTTTACAATTGGAATAGATATTGATTCTCGGGCTTATTTTACAG
CTGCTACAATGGTAATTGCTGTTCCTACTGGGGTAAAAGTTTTTAGTTGGGTTGCTACTATCTTGGGTTC
TAGATTTTCTATAGATGTCCCATTATATTGGACTTTAGGGTTTATTTTTCTTTTTACAATAGGGGGATTA
ACTGGTATTATTCTTTCTAATTCTTCTCTTGATATTTCTCTTCATGATACATATTATGTTGTTGCTCATT
TTCATTATGTTCTTTCAATGGGGGCTATTTTTGCTATTATAGCTGGAATTTTTCATTGGCTTCCAGTTAT
GTATAATATTTCTTTTAACCCTAAAATTTTAAAAGTTCAGTTTTATTCTATATTTGTAGGAGTTAATATA
ACTTTCTTTCCTCAACATTTTCTTGGATTAAACGGAATACCTCGACGGTATTCTGATTATCCTGATGCTT
TCACTTATTGAAATTTGGTTTCTTCTATTGGTTCTTATATTTCGGAAATTTCGATTATTCTTTTAATTTG
GGTTTTTTGGGAAAGAATATCATCTAGTCGGGAAAAATTAAGATATTTTTTTCTTGTTTCTGGAATTGAA
TGGATAAATATGTATCCTGTAGAAGAACATACATTCAATCAATCTTGTTTTTTAATAAAATTTT

COX1.zip

This is an assembled COX1 Gene from the same species.

The whole assembled genome is also available under accession NC_011574, i will also make it available here:

>NC_011574.1 Steganacarus magnus mitochondrion, complete genome
ATGCGATGGTTTTATTCAACTAATCATAAAGATATCGGAAGTTTGTATTTAATGTTTGGTGTATGAGCGG
GTATTTTAGGGTCTTCTTTAAGTTCTTTAATTCGTTTAGAACTTGGACAAATTGGTTCAATTTTATCAAG
AGATCAAATTTATAATGTTTCTGTAACTTCTCATGCTTTTATTATAATCTTTTTTACTGTTATACCTATT
ATAATTGGAGGTTTTGGAAATTGGATAATCCCTTTAATAATTGGATCTCAGGATATAGCTTTCCCACGAA
TGAATAATATAAGTTTTTGGTTTCTTCCTCCTTCTTTATTGTTTCTTATATCTTCTTCTTTTTGTGGACA
AGGAGTGGGTACAGGTTGAACAGTTTATCCTCCCTTGTCTAATATTTTATTTCATTCAGGTTATTCAGTA
GATTTGTCAATTTTTAGTCTTCACATAGCTGGGGCTTCTTCTATTTTAGGGGCTATTAATTTTATCACTA
CAATTTTAAATATGAAATCTAGGTTTTTAAGTTATGATTGTATACCTTTATTGATTTGATCTATTTTTGT
AACTGCTATTCTTTTGTTACTTTCTCTTCCTGTATTAGCAGGAGCAATCACAATGCTTTTAATTGATCGA
AATTTTAATACTTCATTTTTTGATCCTTCAGGTGGAGGTGACCCCATCTTATTTCAACATTTATTTTGGT
TTTTTGGTCATCCTGAGGTTTATATTTTAATTTTACCGGGTTTTGGAATTGTTTCTCATACAATTTCTTA
TTATTCTGGAAAAGAAACTCCTTTTGGAAGATTGGGTATAATTTATGCAATGGTTTCAATTGGGTTTTTA
GGATTTATTGTATGAGCTCATCATATGTTTACAATTGGAATAGATATTGATTCTCGGGCTTATTTTACAG
CTGCTACAATGGTAATTGCTGTTCCTACTGGGGTAAAAGTTTTTAGTTGGGTTGCTACTATCTTGGGTTC
TAGATTTTCTATAGATGTCCCATTATATTGGACTTTAGGGTTTATTTTTCTTTTTACAATAGGGGGATTA
ACTGGTATTATTCTTTCTAATTCTTCTCTTGATATTTCTCTTCATGATACATATTATGTTGTTGCTCATT
TTCATTATGTTCTTTCAATGGGGGCTATTTTTGCTATTATAGCTGGAATTTTTCATTGGCTTCCAGTTAT
GTATAATATTTCTTTTAACCCTAAAATTTTAAAAGTTCAGTTTTATTCTATATTTGTAGGAGTTAATATA
ACTTTCTTTCCTCAACATTTTCTTGGATTAAACGGAATACCTCGACGGTATTCTGATTATCCTGATGCTT
TCACTTATTGAAATTTGGTTTCTTCTATTGGTTCTTATATTTCGGAAATTTCGATTATTCTTTTAATTTG
GGTTTTTTGGGAAAGAATATCATCTAGTCGGGAAAAATTAAGATATTTTTTTCTTGTTTCTGGAATTGAA
TGGATAAATATGTATCCTGTAGAAGAACATACATTCAATCAATCTTGTTTTTTAATAAAATTTTATGCCA
ATTTGAATATCCCTTTCTTTTCAAAATAGTTTTTCCCCAGAGATAGAAAGAATTAACTTTTTTCATGATT
TTTCTATAACTATAATTATTTTTATTTTTTTGTTAAGAAGATATTTTATTTTTTCTTCTTTAATAGGAGA
ATTTTTTGATAACTTTTCTATTGAAAGTCATGAATTGGAGTTGTTTTGAACTTTTACACCAACTTTGTTT
TTAATATTTTTGGCAATTCCTTCAATGAAGGTTTTATACATGACGGAAGATTCACAAAAACCGTGTTTTA
CTTTTAAAATCACTGGGCATCAGTGGTATTGATCTTATGATTTTATAAATTGTCTTGAAGAGGTTGATTC
ATTTTATGATTCTTCTTTAATATCCCGTTTGTTGAAAGTAAGAAATGTTGTTAATTTACCATCTTTTTCA
AATATTCGAGGTTTAGTAACATCTTCAGATGTAATTCATTCTTGGTCTATTCCTTCTATGGGTGTAAAAT
CTGATTGTTCCCCTGGACGTTTAAATCAAGTTTTTTTAATATCTAAGTTGAATATACTAACTGTTGGTCA
GTGTTCAGAAATCTGCGGTGCTAATCACTCTTTTATACCCATCGTTTTAATTTTTAGGATTGTATTAACC
ATTTTTAGTTTTGTCAATACTTCTATAAGGTTAATTCCTCAAACTAAACCTATAAATTGATTAGTTTTAT
TTTTTTATATTATTTTTATTTTTTTATCAATTTCTGTTGTTATTTTATTAATAGATAATTCAACTCCAAA
TATTGGAAGAAAAAGTTTTAATAAGTTGAATTTAGAATGATAGTTAACTTATTTTCTATTTTTGATCCTT
CTTTTTCGGTTAACTCTTTTTCTTTAATTGTTATTTTTTTACCTTTATTATACTTTTTTTTTATAGAATA
TAATATTAATTTATTAAATTTTATACGTGAAATTTTTGTTATTTTTTTAATTAATGAGTGTAGTCAGGTT
TTAAAGAGTAAATACAATATTTATTCTTATCATTTTTCTCTTTTTTTCATTTTTCTAATTTTTAATTTAT
TATCTCTCTCACCTTTTATTTTCCCTTTAAATTCTCATATTTCTTTAGTTTTTCCTATTGGTTTAATATT
TTGATTAGTTCCAATTTTATCAATTATTCTTAAAATTCCTTATTTTATAGTAGGATCTTTCCTTCCTCAA
GGAACTCCTATTTTTCTTATACCTTTTATGGTTTTAATCGAAATAATCAGTTTAATTATTCGTCCTATTA
CTTTATCTGTCCGATTGACTGCTAATATTACAGCTGGACATCTTCTTATTTCAATTTTATTAAATTTTTT
GTTATCTTTTTCATTTAATCTTTTATCTTTAATTCCTATTATAATAATAATAGTTTTAGAATTTGGAGTA
GCTATTATTCAATCTTATGTGTTTTTTACACTTTTAAATATATATATTTCTGAAATTTAATGATATTTAA
GTTAAACACATTTCATTTTGTATTTATAAGACCTTGACCTTTATTTGTTTCATTTTCTTCTTTTAACTTT
TTTTTATCTATTATTATTTGAATAAATTATGGAAATTTTATTTTTTTAATTATAGCGATATTTAATATTA
TATTTTGTTCGTATGTATGATGACGGGATGTAGACCGAGAATCTTCTTTTTTAGGGTCTCATTCTCTCAA
GGTAAAATATGGATTAAAATTTGGTATAATTCTTTTTATTGTTTCGGAGGTAATATTTTTCTTTTCTTTT
TTTTGATCTTTTATTCATGGGAGCTCTTCATCTGATGTAACAATAGGAGCATCATGACCACCTGAACAAG
TTTTTCCCTTTGATCCTATGAGAGTTCCTTTAGTAAATACTATTGTTCTTCTTTCTTCAGGAGCTTCTGT
TACTTGGAGTCATCATTCTATTTTGAAAAATAAAATAATAGTTTCATCTATTTCTCTTTTTTTTACTATC
ATGTTGGGGGTATTTTTTACTATTCTTCAATATTTAGAATATTTAGGTTCTCCGTTTTCTATTTCAGATA
GAGATATAGGCTCAACTTTTTTTATAGCTACAGGATTCCATGGAATTCATGTGATTATTGGTTCTACATT
TCTAATAATTGAGTTTATTAAAAATGTTTTTTATGTAAACACCAACTCACAAATTGTAGGTTTTGAGTGT
TCTGCTTGATATTGGCATTTTGTTGATGTTATTTGATTGTTTCTTTACTCTTTGATTTACTGATGGGGTT
CGTAATTCCTTACATTCTTATTTCCAATTAGATTAATAGGGTTGATTAAATTCACTCTTTTTGTTTTTTC
ATCTTTTATTTTAGTTCTATTATTTTATAATCTTGTATATTTTTTATTTTTTCGGTTTGAGGGTTTTAAT
ATTTGAAGCCCTTTTGAATGTGGTTTTAATAATAACTTTTTTGGGAATAATCCTATATCCTATCAATTTT
TTGTAATTGGTGTTTTGTTTTTAATTTTTGATGTGGAAATCGCTTTAATTATTCCATTTTCTGTAGAAAA
ATGAATTGACAAAAATATGAATTCAATAATTATTTTTTTACTTATTTTAATTTTTGGGGTAGCATATGAA
TGAAAGAGTGGGAAGATTCAATGGTTAAAGTGGATGAGTTAAACTTTTTTGCATTAAGTTAAAATCACTA
GAGTGATTAAAGAACTTCTAATTCTTGAAAGATTCTATCACACTCTTTAGTGAGTCTTTTTTTAGTGTTT
TCAGCACTATTTTTTGACTTAACATTTTTTTAGAATTTAAGATTCTAATTTGCTTGAGCTTGTGTTAAGA
AAAAATTACCCAGAGTGAGGGTATAGATAAAATTAAGAGTTTTAAAGAGTCTGAAGGAGATAGAATTCTT
TTATATCTTACTATTATAGGAGTCTTTTTAATTATGGAAATGAATTCAACTCAAGATCTTTCTATTTTAA
AGATTGAAAAATATTTTAATATATTTGATATTTCACTTTTTCTTAGCCAGTTTATAAACCTCCCATCTAA
TCTTAAAATCTTAAATTTATTAATTTTTAATTTAATTTTTTCATACATTAATAATAATAAAATCAATAGA
ATTCCCTTTATTAAATCAAATAGATTTGGCAAAAATAAAGTTTCGGTAAATATATAAAATATCCAACTAC
CTGAAAATAAACAGAATTTTAATAGAAAGATTAAGGAAATAGAAAAATTTAATCTTCTTTCAGTTCTAAA
ATTTGATTTGAAAGAGGAAGAAAAAGTTAGTGAAATAATTATAATTCGTGAAGTGTAAATAATAGTTAAA
ATACAACCTAAAAAGAAAATTAACTCAATAATAAGATTGTAAAAAATAGAGCTACTTGATAAAATAATAT
CTTTTGAGTAAAATCCTAAAGAAAAGGGGAACCCTATGAGTCTTAATCTTCTAACTAATATAGAAGTAAG
AGTTAACTTATTATTCAAAAAACCACTTATAAAACGGAAATCTTGACCTCCACTTAATGAATAAATTAAT
CCACCAACAGAAATAAATAATATAGATTTAAAAAATGAATGAAACATTATGTGAATAAAAGAAATTTTCC
AAAGTCCTAAAGATACTACTAAAATTATAAAACCAAGTTGTCTAAGAGTAGATATAGCCACCAATTTTTT
CAAATCTGTTTCTACAATGGCAGAAACACCTGCTATTAAAAAGGTTATTAATGAAAAAGTTAAAACAAAG
TAAATTAAATTGTAAAAGAGGAAATTAAAACGAATTATTACAAAAATTCCTGCTGTAACTAAAGTAGACG
AATGAACTAATGCAGAAACGGGTGTTGGAGCTGCTATAGCAGCTGGAAGTCAAGAAGAAAAGGGAATTTG
AGCTCTTTTAGTAATTGCTCCTCTAATAAAAAATAAAGCTAGTATTTTTCTTCTGGCAGAGAATAGATTT
ACTTCTAATCCTCCTAACTGTATTATATAAAAAGACATACATAAAAATAAAGAATCCCCCAATCGGTTAG
ATAAAACTGTTAAAATTCCTGACCTAAGACTTTCTGGTTTTTCATAAAAGATGACTAAACAAAATGAAAC
TATTCCTAAACCGTCTCATCCTACTATTATAGAAATAAAATTGTTAGATACAACCAATAAAATTATCGAT
CCGACGAATAAAATTAAAGTATATATAAAACGTCGATATCTTAGAGTTCCTTTTATATAAAAGCTAGTAA
AAATTAACACAGATATTGAAACTACAAAAATGCATATTAAAAATCCTGYTCTTACGTAATCTAARGAAAA
AGAAAATGAGTAATTATTTCTTGTATTTAGAAAAATTGGAATTTCTAAAATTATTCTTCTATTTAAACAA
TTTTTAATTATTATAATAGAAATAAATGAGAAAATTAACATGCAAATAGTACTTGAAAAAATTATTAGAC
TTTTCTTCAGTTCCACAGACTGTTTTAAAAATCTAAACCAATAATAGAAAGAAAAGATAGAAAATTAAAC
TGAAAATTAATGATAAATCTATTTTTCTAATATTTATATACCCAAAATTAGAAGGATTTCCTTGACTGAA
ATAAATAAAATAAAAGATTATATATAATCCAGATAAGAAAAAAATAAAAAGTGAAACTGGAATTAAAATA
TGATTAATCAATAAAGCTCTAATACCCAGTATAAATTCAGAAAAAAAAGATAGTAATGGGGGTATTCCTA
AAGATGAAATTATAATTAAAAATCCTGAGAGAAACAAAAGGGGATGCAAAGAGTTAACCCCTTTTATTAT
AATTAAACTCCGTGAATTTTGTCGATCATAAAATATATAAACTAAGAAGAACATCAAAGGTGAACTCACT
CCGTGAGATAAAACTATATACATACATCTAGAAAATCCAGTGTTTGATCCCCTTAGGGCTCCATTAGTGC
ATATTCCTATATGAACAACTGAAGAGTAAGCAACAAGAGACTTACAATCTACTTGACGATAACAAATTAT
AGCAACAATTAAAGATCTAACTCATCCCACACATGAAAAAATCCTAGAAAAATGAGTGAGAGCTCTTATT
TTTATCAAAGAAACTCGAAAGATCCCGTAAATTCCAATTTTTAAAAGAATCCCAGATAGGAAAATTGAGC
CAGATAAAGTAGCTTCCACATGGGCTTTAGGAAGTCACATGTGAAGACCAAAAAGAGGAATTTTTACTAA
AAATACAAGTAAAATAAATATTCATCACATATTTTGATTAAATAAAATTATTATCATAGATAAAATTTGT
TCTTCATCTAAAGAAGAAAATATTAATACTAGTAATGGAATAGAAGTTGCTAAAGTAATGAAAAATATAT
AGTATGAAGCCATCTCTCGGTTTATTTTGCTACTAAGAGAGAAAAGAAAGAAGAATATAAATAAAAAGAT
GATTTCAAAATTAATATAAAAAATAAATATTGAATCAAACACAAAACATATATTTAAAAGAATAAATATA
ATTATTAAAAATGTTTGTTCGACTTTCTTTTCATCCCTTTGTTTTATTGATAAAAAAACTGAAAGAAAAG
AAAATACTCTTACGATAAAAACCAATCTAAGGAATAAAGTCCTTATAATATCGATATTTAAAAGTCCAAA
TGAATTATTTAAAGATATTATTCTGATAGAAAGAAAGAAAAAAATTGACAAAGAAGATAAAAATGAAAAG
TCAAATTTTAACATAATAGAACAAAAAGAAACTGTTAATAAAAATTTAATAAAAAAATGAAGAATCTTTT
CCAAAAGATCGGTAAGATCAAGTTACTAAAGTAATTCCAAATCTTCCTATACAGACAATGACTGTAATAA
GAAATAAAAATGCGTAAGAACCTCTATTGGGAGTGAATACATAAACATAGGAAAAAATTAAAGATAACAT
TGTCATTTCAAGAGAAATAATTATTATTAAATAATGAAATCTGTTAAATAAAAAGATAAATAAAGATGAA
AAAAAAATCAAAAGAACAATAAGTTCTGTCATTTCTAACATTTGGATTTTGTAAATCCTAAATAAAGAAA
TTTTTTTATTTTTATTTTTCCTTATTTTATCAATAGGAACTAAGGATCCATTTTTATTCTGTGTAATTTT
ACTATCTCTTTCAATTTCTTTATCATTTTCAATGTTATGAAAATTATGTTCAGTATGAATCCCTTCTATT
ATTCTGATTACTTTCTCTAGTGGAATAATAGTTTTAGTTATATACTCTTCATCTCTAGTTTCTTTAAAAG
GTGAGAAAAAATTGAAAATTAAAAGTTACTTTTTAGCCTTAATTTTAATTTGGTTATTTAAAGAAAATAA
TAGAATGGAAGAAAAAACTGAATCTTTCTTTAAAGAATTTTCGTCTTCCATATATTTACCATTAATGGCA
TTAATTATACTAATATTTATAATTCCTATAATCGAAAATTGCTTTGTTCCCTTTAATTCACTTCAATCTT
CTTTTTAATGAAATTAATAAAATTCTACACAAGTCCTCTTTTTGATTTACCTACTCCATTTTCAATTAAC
CTTTTTTGAAATTTCGGTTTTATCCTTGGATTTATTACTACAATTCAAGTAATTAGGGGATTATTTCTAT
CTATATTATTTATCCCTTCAGAAAGGGATGCCTTTTTTTCAATCATTCATATCATACGAGATAGTAATAG
GGGTTGATTTATTCGTTTAATTCATTCCAATGTTGCTTCTTTAATTTTCCTGTGTGTTTACATCCACTTA
GCTCGTGGTTTATACACGAGATCTTTTAAAACTAAAAGTCATGCATGAAATTCTGGAGTAACTATTTTTA
TTATTGTCATAGCCGCTGCTTTCTTTGGGTACGTTTTACCTTGAGGTCAAATATCTTATTGAGGAGCAAC
TGTTATTACAAACATTGCTTCTGCTATTCCTTACATTGGTATCTACCTAGTAAAATGAATATGGGGTAAC
TTTTCTGTTAGTCAACCCACTCTAAACCGATTTTTTTCAATTCACTTCATTGTTCCTTTTTTCATATTAA
TGATAATGGTTATTCATATTATTTTACTTCACTCTACAGGATCAAGAAATCCTATAGGATCAAAAGAAGA
TATAGAAAAAATTGAATTCCAAATTATATTTACAATTAAAGATTTTTTATTTTTTTCTTCAATCTTTTCT
TTAGTTTTTATTTCAATTAGAAATTACCCAGATATTTTTATAGATCCTGAGAATATGAACGAAGCTAACC
CCCTTAAAGCTCCAGTTCACATCCAACCTGAATGGTATTTTTTATTTGCTTACGCAATTCTACGCTCGAT
TACATCGAAGCTTGGAGGAGTTTTAGCTCTAATCTTTTCAATCTTAATTTTATTTTTCATTCAAATTAAA
GGTTTAAGAAAAGTAAGAGGTAAATTTTCTCCAATTGTAAAACTTAGATTCTGAATTAAAGTTTCTTCAT
TTTTAATTTTAACTTTTGTTGGAATGAAACCTGTAGAACTTCCATTTTCCGTAATTGGAAAAGTTTCGAG
ATTTTTATATTTCTTTATTTTTGCATTATGTTATTTTATAGAACTTTGAAAGTTCTCTTTTAGTAACTTT
TAAACAAGAAATTTGCAATTTCATTAAAAAGTATCTCAACGTTTACAATTATATTCACCTTCATTTTTAT
TTCTTCAATTGTAAGATTCATAAGAAATTCATGAATCTTAGTGTGAAGAATACTTGAGGTAAATACTGTA
TGTTTTGTTTCAATTAAAATCATTAAAAAAAAGGAGAATTGGAAGATTAAAACAAAAAGAGACTTGAAAT
ATTTCCTTATTCAAAGAATTTCCTCAGCTATAATTTTAATAAGAGTTCTAATTTTAGAAGATAACATAAC
ACTAAGTTTAATGAGGAATCTTTTAATTGTATCAATCATGATTAAAATTGGTTCACCACCTTTACATCAA
TGATTTATTGAAATTATTCAATTACTTAGAAAGATTCAAGTTTTATTACTTATAACTTGACAAAAATCAA
TCCCAATCTTTTTATTAATTATAGTTTCAAATAGATTTAAGATATTCTTTGTATTTTCAAGAATCATAAT
CTCCTCAATAATAATTCCATTTGTTAAAAATATTTTTCTGGTACTTGGTTGGTCTTCTATTTTTAACAAT
AGTTGAATAATTATATCTTCCAGAATTAGAATTTTAATTACAGTAACATTCATAATTTTGTACTGATCAG
CTGTTATATTTATAATAAAAGAAGTTTTCTATTTCTTTCAATTAAATGAAAACACTAGGAAAAAAAAAAT
CTTTACTTTCAGTTTAATTTCAAATTTAGGAAGATTACCACCAACTTCTGGATTTTTAGCAAAATGATTA
GTGAGAATGAAATTAATTAAAATTGGAGAAATCATAAATCTTTTAATTATCATCATATTATCTATATGAA
ATTCTTATTCATATATACGTTGAATTTCCATCAACGCTCTTATAATTAAAGAAAGTAAAATGATTTTCCA
TAATAATTTAAAGCTAACTTTATTTCTAACTCTCTTTTTGACGCCATTTTTATTTTTGTTCTTTTTAACA
CTCTGATAAGGTGTAAATTTGAGCTCCAGACTATTTAGGATTCATAACCCTTATTAAAACTGGAAACACT
TTCACTTCCAAAAAGTGAATTTTAAAATATCCCAAAATTAAATTTACAATTTACTTAAAACACCTCGAGA
CAAAAAACTACCCCCAATGCAGAATTACTCTCGAAACTTCAGGTCTTCCTAGTGTCTTCAAAACACTAAA
TAAAAGTTTCAAGATATATTTTTATAAACAATCCTTTCGTACTAGTTTTATTTAAGAAATAAATTGGATA
AAAACCAGTCTGGCTCACGCCGACTTGAACTCAGATCATGTAAAAATTTAAAAGACGAACAGTCTACCTA
AGTAATTTTCTACTTCTACCAGGTCAATTAATCCAACATCGAGGTAGAAAACTGTTCTAAAAATTAGATC
TACAAAGAACTATTCCCCTGTTATCCCTAAAGTATTTTCTTAATAAACTCTAAAGCGTTCTATACAAAAA
TAATTGTTTATATAAACAATCATTTCCCCAATTAAATAAAATAAAAATTTTAGGGTCTTCTCGTCCTAAT
TTGAAAAATTCTTATTATCAAGAAACCTATTAATTCAAATTTCCTAAGAAAAAAAAAGGAGGAGTCTCTC
ATTCATTCAGGTTTACATTCAATAAACTAATAATTACGCTACCTTAGCAGAAAACCTGGCCTTTTAAAAT
TAACTGGCAGAGTTTACTTCTTAAAGAAGTAAACATTTTGTGAAAATTTTATCTCCCTTGAAAGTGTATA
AATTAGGAAAAATAAACTCAATTAATAAAATATAACTAAAAATAAAATTTACTTAAAATTCCATTATTTT
TTTTCAATTTTTAAAACTTAGAATTTAAACATTTCTAGTTGTCTTAACCTTTTTTAAGAAAATTTTATAA
ATAAATTTTAGTTAACCTAAAGTTCAAATTTCTAGAAAAATAAAATTATACCAAGATTACAAAGAACCAA
AGTTTTTCACTATAGAAAATTTTCACTTATTTTAACAAAAGAAAATTTTCGATTAATCTCTTAAAATCTT
GAAAATTATAAAATATTTAGGAAGAACCTTATACAAAAGGTAAAAAAAAATTTTTATAAAACATTTATGA
AAACAATAAAACTATTTATAAAACAAAGTATATAAATATAATTATTATAGGGCATATATTTAGTAGGTGT
CTTTTAAATACCTTGACCAACCCTATTTAACAGAATTCAATTAAAAAAATTAAATAAAATAAAACAATAG
GAAGAATGGCCCCTCAAGAAAATATTWTTATAAAATCATAACGAAAACGGGGAAAACCCCTACGTACTCA
AATATAAATAAAAATAAAGAAAAATATAAAAAAAAGTAAATAAAAAGATCCCAAAAAGATTAAAGAGCCA
AGAAAACTTATTAAAATAATCATACCATATTCAACAATGAAAATAACAGAAAAAAGGCCTCCCCCATATT
CAACATTAAATCCTGAAACAAGTTCAGATTCCCCTTCAGAAAAGTCAAAAGGTGAACGGTTAGATTCCGC
TATTTTAAGAAAAATTCAAAGTAAAAAAAGAGGTAAAAAAAAAAAAATTATAATATTACCCTCCTCTCAA
ACTTGGAAGTTAGCAAAACTCAGCTCCATGAATAAAAAAACAAATATTAATAAAAATAGCAAAAGAGATA
CCTCATAAGAAATAGATTGTGAAATTGATCGAACAAAACCTACTATGGAATAAGAACTAAAGGAACCATA
ACCACAAAAAAAGAGAAAGTATGGAGAAAATCTTAAAACAGAAAAAAACAAAACCATTTCTATACCTCCT
CCATATAAAACAAACCAGGAAGGAAAAAAAATTCAAAGGAAAACCATTAATATTATACCGTATATGGGAG
CAAATAAATATATATACATATTTATATATATGTAAACCCCGGCTTCCTTATTTAGTAATTTTAAAGCATC
AGAAAAAGGTTGAAATAAACCTAAAATAAAAACCTTATAAGGACCCTTACGAAATTGAGCATACCTTATA
ATTTTACGCTCTAAGAGAGTAAAAAAAGCTACAGAAACAAAAACACCAATTACTTCTATTATTAAATCAA
TAAGATACATTAACTAGATCCTCCTTCAGGAGGATCTACTTTGTTACGACTTGTCTTTAAAGAACGACGG
GCGATATGTACACTAAAACTTACTTTTCAAAAAGAAAAAAAATTAATTAAACAAATTTAAAAATAAATCC
TATCGAAAAATCTGAATAAAAGTAACTCATTAACCCCCTATTAAACCGAACATTGACCTGATCTGATTAA
AAAAACAAAAATGTATTAAATTTGGTTACAGCAGTATACGAACAAAAAAGGGTAAAAAAGTGGGCTATCA
ATTTAAGCAACAAGTTCCTCTAAAAGAATAATAGCCGTCAAATGGCTAAGGTTTAATAAATTAATACCTT
CTTTATTTTGTAACAGGGTACCTAATCCTGGTTCTATTTATTTCTCTTTCTACAAAAAAACTTAAAATTA
AAATTTTACTCAAAATTCTTAAAAAATATACTATGTTTAATCAACTTCTCCCTAAATCTAATGTATAACC
GCAAAAGCTGGCACAATAAAAATTAAATTAATTTAAAACAAAATTAATTAAAATAAACTATTAGAAAAAA
AATAACCTATATAAATAGATTTTTAAGCAGAAAAGATACAATTAAACAATCTTGGTAAATAAGTAGTTTC
GGCCTTCTAAACCACAAATTTTAAGCTCTAACAAAGCAAGATTAATGAAATTATTGTAAATAATTAAAAA
CCAAAAATACTCAAATTTCAATCCTAAATCAAAAATATGTATTTTTGGGATTAAAAATTTCAATTTTATA
AATAATTTTAAAACTTCAGTAATCAGTTTATCTCAAGAATATAAATAATTATGATAAAAATCCGGGTTTT
ACTTACCGTGTACTATCAAATACAAACATATATTTAGATATAAAAACCAAATATATAAATATATACACCT
TTTTAAAAGATGTTGAAAACCTGATGATTAAAATTCAAGAATTAATTTTAATAAAAGATTTCGTGAAATT
CCCACATATAAATGATAATTTAGGAGTTCTGCCGGAGCTAGAACCTCAAATTATCATTCTATGGCTATCT
ATTTGGGGATAAGCAAACAAATAGGTTTAGCCTTTCTATTTAAATTAAAAAAAGGTTTCTTTTTTAAAAA
ATCTCTCTTTCTATTTTCATATTTTTTTACAGAAAGTAGAAATTTCCCATTCGTGTGTATTTTGTTCTTT
AGGTGTGATCTTATGTGTAATTTTAAACCTTTAATATATATGTATAGGCTAAACTCTTTGGATTATTTCT
CTGAATAACTTTGATTGTGTGAAAATTTACGGTTTTAAATTAGTACTTTACGAGAAACGCCCGAGTTTAG
AGTAAAGTATGTATATATAACTATTTATATACATAAGTCTAACGTTCCAGGCCCTCCCAAAAAAAAGGCC
TGGAATTTAAAGATATTAGTAAGTGGGTATATATGTCTAATGTTTCTATTAATCATCAGGTTTTCAACAT
CTTTTAAAAAGGTGTATATATTTATATATTTGGTTTTTATATCTAAATATATGTTTGTATTTGATAGTAC
ACGGTAAGTAAAACCCGGATTTTTATCATAATTATTTATATTCTTGAGATAAACTGATTACTGAAGTTTT
AAAATTATTTATAAAATTGAAATTTTTA

NC_011574.zip

I tried the assembly with and without this Mitogenome as backbone, but until yet, i didn't get any good results.

ndierckx commented 4 years ago

Hi,

I just saw you have transcriptome data, that won't work to get a complete assembly. You need WGS data

shearingham commented 4 years ago

Yeah that's true. Okay, no wonder it was not working properly. Thank you very much for all your help!

ndierckx commented 4 years ago

I tried the assembly and it does assemble regions around 3000 bp, but then there are gaps. So if you want you can assemble most of the mitogenome. But for this kind of data I wouldn't recommend NOVOPlasty as it relies on continuous assembly. You could make a batch file with different short seeds of the reference mitogenome, but then you can equally use a graph assembler and filter the mitochondrial contigs. The problem that you initially posted has nothing to do with the fact it is transcriptome data, so if it still occurs for other datasets, you can let me know

shearingham commented 4 years ago

Okay thank you very much for your help! I will close this issue for now.