ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
174 stars 63 forks source link

invalid seed #135

Closed Nickdykstra closed 4 years ago

Nickdykstra commented 4 years ago

Thank you for making this software, I've previously used it to great success. Currently I'm using the latest update to assemble some crayfish mitogenomes, using seeds from NCBI, but novoplasty is giving me the invalid seed response. I've attached my config file config.txt

ndierckx commented 4 years ago

It's better to send the log in stead. But I wouldn't use quality scores option and insert size can't be smaller than read length...

Nickdykstra commented 4 years ago

Got it, I've made the changes you suggested, still getting invalid seed log_P.clarkii.txt

ndierckx commented 4 years ago

Hi,

I would take insert size 500, but I don't think that is the problem. Or you have no mitochondrial reads in the dataset, I had users before who had that problem, it took long to find out that was the problem. Else it could be a problem with the read ids, can you send a snapshot of the first 5 reads of the forward and reverse files (you can use 'head -n 5')

Nickdykstra commented 4 years ago

the first lines for my forward reads are: @M00160:40:000000000-J35KH:1:1101:22139:1116 1:N:0:GAAGCGG+CGGCTCT GTATCTGAGCCTTGCGGTTTACCTAGCTCTTCGGTGGTGGCTACGATGTCAAGGATATCGTCTTGTACTTGGAAGGCGAGTCCGACCGCACGAGCAAAATC + CCCCCGGGGGGGGDFFGFGGGGGGGDGGGGGGGGGECF@FFGFGGGGGGGGGGGEEGFGGFFGGGGGGGGGGEGDGGGGGGGGGGGGGGEGEGGGDGGCGG @M00160:40:000000000-J35KH:1:1101:22160:1117 1:N:0:GAAGCGG+CGGCTCT

ndierckx commented 4 years ago

Seems fine, but it's easy to know if you have mitochondrial reads. Take the seed you used (if it is at least 500 bp) or even better the closest reference and use bowtie or bwa to align all reads to the reference. The genes should have plenty of reads that align. It's easy to do and than you know..

bowtie2-build bowtie2 -x -1 reads_1.fastq -2 reads_2.fastq -S samfile.sam

brenna-levine commented 3 years ago

Hi, I'm having the same issue. I know my files have mitochondrial reads in them, as I have previously aligned these to a reference. Here is my log file:

Project:

Project name = bed_bug_test Type = mito Genome range = 12000-22000 K-mer = 33 Max memory = Extended log = 1 Save assembled reads = no Seed Input = sequence.fasta Extend seed directly = no Reference sequence = Variance detection = Chloroplast sequence =

Dataset 1:

Read Length = 250 Insert size = 500 Platform = illumina Single/Paired = PE Combined reads = Forward reads = reads_1.fastq Reverse reads = reads_2.fastq

Heteroplasmy:

Heteroplasmy = HP exclude list = PCR-free =

Optional:

Insert size auto = yes Use Quality Scores =

Subsampled fraction: 90.97 % Forward reads without pair: 26 Reverse reads without pair: 0

Retrieve Seed...

INVALID SEED, PLEASE TRY AGAIN WITH A NEW ONE


The seed is:

MT882033.1 CAAAGGTAGCATAATAATTTGTTTTTTAATTGGAAACTAGTATGAATGGTCATACGAGGGATTGACTTTC TTTATCTTACTTAAATTAATTTTATTTTTCTGTGAAAAAGCAGAGATTTCATTAGTAGACGATAAGACCC TTTAAAACTTTATTCATTATAGAAGTATAATTTTGTTAGGTTTTAATAGTGTACTTTTTTAATGAGTTTT GTTGGGGCGACAGGTAAATTTATTTAACTTTATTTTTGTTTTTCACTAATTAGTGTTTATTTGATCCAGA TTTAGTTTTGATTATAAGTTTAAGTTACTT

Can you advise?

Thanks,

Brenna