wdecoster / cramino

A *fast* tool for BAM/CRAM quality evaluation, intended for long reads
MIT License
128 stars 11 forks source link

Help understanding Normalized read count per chromosome across techinical replicates #28

Closed NikoLichi closed 8 months ago

NikoLichi commented 8 months ago

Dear Wouter,

Thanks again for another super fast processing tool for long-read data.

I am comparing some technical replicates across two different sequencing centers and used Cramino to evaluate the Normalized read count per chromosome, only focusing on the main Chromosomes of the human genome (entries with chr). After some plotting, I noticed a discrepancy from what I expected: the files whose name start with Novo_ have less output, and therefore, it is expected to have a reduced number of Normalized reads count per chromosome. However, I noticed the contrary.

Could you please help me to understand this behavior? I attached one sample case. Please notice that the Novo_ file has almost half of the reads than the other file (~3millon reads less).

As an extra question, below the chromosomes is a metric with the Media/mean number of exons. Are those numbers also Coverage values for exons? Eg. Novo file Median = 2. Then, normalized read count of 2 for each exon across the genome?

Thanks and all the best, Niko 115_T00_16_FC2_F.cramino.txt Novo_115_T00_FC515_F.cramino.txt

wdecoster commented 8 months ago

So this is RNA-seq, right?

NikoLichi commented 8 months ago

Yes, sorry that I forgot to mention it. It is cDNA from RNA isolation.

wdecoster commented 8 months ago

Okay, number of exons is how often a read was spliced; it doesn't say anything about the coverage per exon. Cramino doesn't have any annotation. Normalized read count makes the most sense for DNA genome sequencing, where the number of reads is expected to be proportional to the length of the chromosome. For genome sequencing you expect each chromosome to have a normalized read count of 1, and you can identify the sex of the individual by looking at the sex chromosome read count, as well as any aneuploidies... I don't think it is of any use for what you are trying to compare.

NikoLichi commented 8 months ago

Thanks for your input, Wouter.

Yes, I have used the read count to see if the sex of the samples matches our experiment, and it does, as shown in the attached figure. Samples 65 and 81 are Males. I agree that Normalized read counts are more useful for DNA than RNAseq, but they could be helpful in both cases... Anyway, I will stick to other tools/count metrics to measure coverage per gene/transcript, etc., but at least Cramino metrics on the N50 and median length would be useful.

Again, thanks a lot, and I'll close this issue. All the best, Niko

Screenshot 2024-03-28 at 15 52 12