ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
174 stars 63 forks source link

Recovered fragment is shorter than expected #145

Closed m-statham closed 4 years ago

m-statham commented 4 years ago

Dear Nicolas Dierckxsens,

I am running NOVOPlasty 4.2 and I am trying to recover mitogenome data from adapter trimmed illumina sequence data. So far I have recovered ~1300bp, which is just slightly longer than my seed data. If use a seed from a different, but shorter region, the result is worse. If I try using the ~1300bp recovered as the seed it only recovers a short fragment.

I have tried running with "use quality scores" and without. My sequences are from a museum specimen, so I expect there to be issues with the sequence quality. I have also tried with a difference reference genome from member of the same genus, and the result is pretty similar


NOVOPlasty: The Organelle Assembler Version 4.2 Author: Nicolas Dierckxsens, (c) 2015-2020

Input parameters from the configuration file: Verify if everything is correct

Project:

Project name = LA27660 Type = mito Genome range = 12000-22000 K-mer = 39 Max memory = 20 Extended log = 1 Save assembled reads = yes Seed Input = /home/statham/novoplasty/mitoref/SMHM_Dloop7.fasta Extend seed directly = no Reference sequence = /home/statham/novoplasty/mitoref/R_mexicanus_whole_mtDNA.fasta Variance detection = yes Chloroplast sequence =

Dataset 1:

Read Length = 151 Insert size = 300 Platform = illumina Single/Paired = PE Combined reads = Forward reads = LA27660_S1_L001_R1_001_adapt_trim.fastq.gz Reverse reads = LA27660_S1_L001_R2_001_adapt_trim.fastq.gz

Heteroplasmy:

Heteroplasmy = HP exclude list = PCR-free =

Optional:

Insert size auto = yes Use Quality Scores =

Subsampled fraction: 99.99 % Forward reads without pair: 941 Reverse reads without pair: 577

Retrieve Seed...

Initial read retrieved successfully: TATGTATATCGTACATTAAATTATATTCCCCTAGCATATAAGCATGTATAATTTAATTAATTATTTACCACATAAAC

Start Assembly...

------------Assembly 1 finished: Contigs are automatically merged in Merged_contigs file------------

Contig 01 : 1299 bp

Total contigs : 1 Largest contig : 1299 bp Smallest contig : 1299 bp Average insert size : 300 bp

-----------------------------------------Input data metrics-----------------------------------------

Total reads : 17967510 Aligned reads : 190 Assembled reads : 184


m-statham commented 4 years ago

I have attached the extended log below log_extended_LA27660.txt

ndierckx commented 4 years ago

Hi,

First, if you use a seed it will probably only look at the first 200 bp, because it just used to retrieve one sequencing read to extend, so length of seed doesn't matter, so that 1300 bp is fully assembled independently from the seed (unless extend seed directly is used). Checked your extended log and you have very low coverage, as mentioned in the config info part, you should then lower the kmer to 23 or so. And best not to detect variance when you have this low coverage... But probably will never result in complete assembly as it contains coverage gaps (like often with old samples)

m-statham commented 4 years ago

Hi Nicolas,

I tried lowering the kmer to 23 and then 21. I recovered ~2121bp. So a slight improvement. Looks like I need more seqeuning!!

Thanks for the prompt response.