ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
176 stars 63 forks source link

Input data metrics explanation, % of mtDNA reads of the total sequence reads that mapped to the whole mtDNA #216

Closed AliBasuony2022 closed 9 months ago

AliBasuony2022 commented 9 months ago

Dear friends,

I have got a question from a reviewers regarding % of mtDNA reads of the total sequence reads that mapped to the whole mtDNA in Novoplaty. Where I can find this information, in Novoplasty outputs, please. Is it 0.43 % (see Input data metrics below, please)

Can someone explain the "Input data metrics", please- I'm just confused?

Below is the log file.

Kind regards, Ali


NOVOPlasty: The Organelle Assembler Version 4.3.1 Author: Nicolas Dierckxsens, (c) 2015-2020

Input parameters from the configuration file: Verify if everything is correct

Project:

Project name = mito_1_375 Type = mito Genome range = 15000-18000 K-mer = 33 Max memory = 64 Extended log = 1 Save assembled reads = yes Seed Input = NC_008434.1_Vv_complete_mitogenome16813bp.fasta Extend seed directly = no Reference sequence = Variance detection = Chloroplast sequence =

Dataset 1:

Read Length = 151 Insert size = 350 Platform = illumina Single/Paired = PE Combined reads = Forward reads = /mnt/scratch/c1845371/whole_genome/data/375_R1.fastq.gz Reverse reads = /mnt/scratch/c1845371/whole_genome/data/375_R2.fastq.gz Store Hash =

Heteroplasmy:

Heteroplasmy = HP exclude list = PCR-free =

Optional:

Insert size auto = yes Use Quality Scores = Output path = /mnt/scratch/c1845371/whole_genome/mitochondrial_genome/mito_12/

Subsampled fraction: 24.14 % Forward reads without pair: 13259 Reverse reads without pair: 5025

Retrieve Seed...

Initial read retrieved successfully: TCTTACACCCGCCAGATCTTGCTGTCTATCTATAGATATCATTTCCTTGATATTTTATTTTTTACCGCCTCTATAGTTCGCACCAACAAAGCCAAAAACAAAAGTTAATGTAGCTTAATTAGTAAAGCAAGGCACTGAAAATGCCAAGATG

Start Assembly...

------------Assembly 1 finished: Contigs are automatically merged in Merged_contigs file------------

Contig 01 : 16521 bp Contig 02 : 349 bp Contig 03 : 992 bp Contig 04 : 385 bp Contig 05 : 881 bp

Total contigs : 5 Largest contig : 16521 bp Smallest contig : 349 bp Average insert size : 337 bp

-----------------------------------------Input data metrics-----------------------------------------

Total reads : 105400318 Aligned reads : 455762 Assembled reads : 418834 Organelle genome % : 0.43 % Average organelle coverage : 4176


ndierckx commented 9 months ago

Hi,

Yes it is indeed 0.43%

AliBasuony2022 commented 9 months ago

Thanks so much,

But the the number of raw reads (pairs) for both mitochondrial and nuclear together is 216,237,628 . What the number 105400318 in the Input data metrics referes to? Is it the number of mitochondrial reads?

Sorry, I'm still confused.

Best regards, Ali

ndierckx commented 9 months ago

105400318 is the total reads used. You have put a max memory, so it subsampled your data and only used 105400318 reads, it doesn't call the rest when you subsample. You have a large dataset so don't need to use the complete set

AliBasuony2022 commented 9 months ago

Good point. Thanks so much, Nicolas.

AliBasuony2022 commented 8 months ago

Dear Nicolas,

Just a follow up question for this issue.

How do I know the right % of mtDNA reads of the total sequence reads that mapped to the whole mtDNA? I'm doing a comparison between the performance of NOVOPlasty and other de novo assemblies and this information is so important.

When I used adifferent memmory settings (all other settings are fixed), I have got the same lenght of the largest contig, but with differnt number for assembled reads, aligned and total reads.

Does the subsampled fraction: 99.99 % when setting the Max memory= Null is right? if so, the number of total reads is over the number of reads in the raw data. I'm still confused, sorry.

max memory Null log_mito_1_375_12_6_max memory Null.txt

max memory 100 log_mito_1_375_12_3_max memory 100.txt

memory 64 log_mito_1_375_12_max memory 64.txt

Kind regards, Ali