hzi-bifo / RiboDetector

Accurate and rapid RiboRNA sequences Detector based on deep learning
GNU General Public License v3.0
96 stars 16 forks source link

The proportion of ribosome reads calculated by ribodetector is not equal to the proportion of ribosome protein count in ReadsPerGene.out.tab generated by STAR #39

Closed li1311139481 closed 1 year ago

li1311139481 commented 1 year ago

I want to test the proportion of ribosome reads in my RNAseq. So I used a ribodetector. The idea of this code is that I use a ribodetector to export the rRNA reads to a fastq file, and then calculate the number of reads in the fastq file, divided by the number of reads in the original fastq file is the rRNA reads ratio My code is as follows

for id in `ls "$@" | grep "_1_trm.fq.gz"`; do
echo "processing $id"
id2=${id%_1_trm.fq.gz}_2_trm.fq.gz
BASE=${id##*/}
BASE=${BASE%_*_trm.fq.gz}
ribodetector_cpu -t ${THREADS}  -l 100 -i ${id} ${id2} -e rrna -o cutadapt/${BASE}_ribodetector.norRNA_1.fq cutadapt/${BASE}_ribodetector.norRNA_2.fq -r cutadapt/${BASE}_ribodetector.rRNA_1.fq cutadapt/${BASE}_ribodetector.rRNA_2.fq --chunk_size 2000
# for read1
rRNA=`cat cutadapt/${BASE}_ribodetector.rRNA_1.fq | grep "@" -c`
total=`cat $id | grep "@" -c`
ratio=`awk 'BEGIN{printf "%.2f%%\n",('$rRNA'/'$total')*100}'`
echo "rRNA reads ratio: $id" >>QC/rRNA_ritio_estimate.txt
awk 'BEGIN{printf "%.2f%%\n",('$rRNA'/'$total')*100}'>>QC/rRNA_ritio_estimate.txt

Then i get ribosome reads ratio like this:

rRNA reads ratio: /cluster/facility/hlhuang/zifeng001/ZF053_activated_human_T_RNAseq/20230702_AHG32GDSX7_U1244_ZF053-NJU-1_split//cutadapt/dT30VN-S1_1_trm.fq.gz
0.13%
rRNA reads ratio: /cluster/facility/hlhuang/zifeng001/ZF053_activated_human_T_RNAseq/20230702_AHG32GDSX7_U1244_ZF053-NJU-1_split//cutadapt/dT30VN-S2_1_trm.fq.gz
0.14%
rRNA reads ratio: /cluster/facility/hlhuang/zifeng001/ZF053_activated_human_T_RNAseq/20230702_AHG32GDSX7_U1244_ZF053-NJU-1_split//cutadapt/dT30VN-S3_1_trm.fq.gz
0.14%

Next, i use STAR to align the fastq file to human genome. after that i get ReadsPerGene.out.tab file which contains raw count of every gene So I calculated the total number of counts some ribosomal protein genes, including RPLP0,RPLP1,RPLP2,RPL3,RPL3L,RPL4,RPL5,RPL6,RPL7,RPL7A,RPL7L1,RPL8,RPL9,RPL10,RPL10A,RPL10L,RPL11,RPL12,RPL13,RPL13A,RPL14,RPL15,RPL17,RPL18,RPL18A,RPL19,RPL21,RPL22,RPL22L1,RPL23,RPL23A,RPL24,RPL26,RPL26L1,RPL27,RPL27A,RPL28,RPL29,RPL30,RPL31,RPL32,RPL34,RPL35,RPL35A,RPL36,RPL36A,RPL36AL,RPL37,RPL37A,RPL38,RPL39,RPL39L,UBA52,RPL41,RPSA,RPS2,RPS3,RPS3A,RPS4X,RPS4Y1,RPS4Y2,RPS5,RPS6,RPS7,RPS8,RPS9,RPS10,RPS11,RPS12,RPS13,RPS14,RPS15,RPS15A,RPS16,RPS17,RPS18,RPS19,RPS20,RPS21,RPS23,RPS24,RPS25,RPS26,RPS27,RPS27A,RPS27L,RPS28,RPS29,FAU. I add up the counts of these genes and divide by the total counts of all the genes. The ribosome reads I got were 23%. So can you tell me why they are different ? Which result should I believe? Thank you

dawnmy commented 1 year ago

Hi, thank you for trying out RiboDetector. I think here you are comparing two different things. RiboDetector identifies rRNA reads (non-coding). And you used STAR to count the ribosomal protein reads (they are not rRNA but proteins associated with ribosome). Ribosomal protein are proteins in conjunction with rRNA, that form the ribosomal subunits involved in translation.

li1311139481 commented 1 year ago

Thank you, I've learned.