ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
176 stars 63 forks source link

having problems reading reference genome #73

Closed dmassardo closed 1 year ago

dmassardo commented 5 years ago

Hi Nick, I am trying to assemble mitochondrial genome for butterflies and I am getting only empty files. Also I don't get any error. Seems like it does not read my reference sequence, can you help me with that? See output below.


NOVOPlasty: The Organelle Assembler Version 2.7.2 Author: Nicolas Dierckxsens, (c) 2015-2018

Input parameters from the configuration file: Verify if everything is correct

Project:

Project name = Test Type = mito Genome range = 12000-22000 K-mer = 39 Max memory = Extended log = 0 Save assembled reads = no Seed Input = /gpfs/data/kronforst-lab/dmassardo/Mk-Hmelm.fasta Reference sequence = /gpfs/data/kronforst-lab/dmassardo/reference/Hmel2.5.scaffolds.fa.gz Variance detection = no Heteroplasmy = HP exclude list = Chloroplast sequence =

Dataset 1:

Read Length = 151 Insert size = 300 Platform = illumina Single/Paired = PE Combined reads = Forward reads = /gpfs/data/kronforst-lab/dmassardo/file_1.paired.fq.gz Reverse reads = /gpfs/data/kronforst-lab/dmassardo/file_2.paired.fq.gz

Optional:

Insert size auto = yes Insert range = 1.8 Insert range strict = 1.3 Use Quality Scores = no

Reading Input......OK

Scan reference sequence...

ndierckx commented 5 years ago

Hi, your reference should be a fasta file... Just remove the reference, it will assemble without. I will upload a new version by tomorrow, it runs batches so if you have multiple datasets, it can be useful

dmassardo commented 5 years ago

Hi Nick, It worked!! Thank you so much!

dmassardo commented 5 years ago

Hi Nick, I am having problem with one sample. The log file gives me this error: "THE INPUT READS HAVE AN INCORRECT FILE FORMAT! PLEASE SEND ME THE ID STRUCTURE!"

So here is the first four lines of my sample. @ERR260302.99 ILLUMINA:276:C0D97ACXX:7:1101:6412:1998 length=101 ACTCACTTTTTTGGAGATCAGAATGATTCATATCTAATATTTATTGGGTATAATCTATATATTTTTAAGACTGTAACCTGTCGCAGGTTGTAGGTTTGATT +ERR260302.99 ILLUMINA:276:C0D97ACXX:7:1101:6412:1998 length=101 CCCFFFFFHHHHHJJJGIJJJJJJJJJJJJJIIIGIIJIJJJIIJJJIGIIJJJJJIJJJIIJJJJJIJIJIJIIIHGHHGFDEDDDACCDDDCCCDDDDC

Please let me know if you need more details.

Thanks, Darli

ndierckx commented 5 years ago

Hi,

How do the reverse reads look like?

dmassardo commented 5 years ago

Hi Nick, Here is how the reverse reads look like @ERR260302.99 ILLUMINA:276:C0D97ACXX:7:1101:6412:1998 length=98 GTATTTTGAGTTGCTAAAGCTCCTATGAGAAACGTCGTACTTGTTACTTGAGTAGTAAAAGNTTTGTACATATTTCCTTTTTTTTTAGTTTCACTTTG +ERR260302.99 ILLUMINA:276:C0D97ACXX:7:1101:6412:1998 length=98 @B@FFFFFHHHGHJGHIIJJJJJJJJJJIIGIJJHIJIJIJJJJJJJJJIIIGHJIGGIEA#.;CHHIIGIGHHHHHHHFFFDDDDBDDADDEDCDDD

Thanks,

ndierckx commented 5 years ago

Hi, yes these id's are not supported, when there is no indication of forward and reserve, the ids should be identical to be recognized. The problem is that you have the length difference, which makes the ids different. I could add some code to my script but don't have the time this week, so maybe better that you quickly adjust your ids. Do you have basic coding or terminal skills to remove everything after "length"?

dmassardo commented 5 years ago

Done Nick. It worked! Thanks

dmassardo commented 5 years ago

Hi Nick, sorry to bother you again but I have an issue with some of my samples. It finish and circularize but the file ends up empty. I am attaching the log file so you can take a look. I have tried different seeds and sometimes it gives me small contigs instead.

Thanks, log_Mk-edem1_clyseed.txt

ndierckx commented 5 years ago

Hi, sorry forgot to answer. Could you run again but wit read length 101 (you wrote 151, but think that is incorrect) and with insert size 220 or so