zhxiaokang / RASflow

RNA-Seq analysis workflow
MIT License
105 stars 58 forks source link

transcriptome analysis #33

Closed wangjiawen2013 closed 1 year ago

wangjiawen2013 commented 1 year ago

Hi, Thanks for your work, now we're trying to use Rasflow to process our RNAseq data. In Rasflow, a transcriptome file (line 55 in config_main.yaml, TRANS: data/example/ref/transcriptome/Homo_sapiens.GRCh38.cdna.all.1.1.10M.fa.gz) is used to quantify transcripts. However, featurecounts have a parameter "-g transcript_id", which can be used to quantify transcripts too using the genome file (not transcriptome file). I think this is more convinence, because we only need to provide genome file and can get both genes and transcripts quantification. Are there any difference between these two methods ?

zhxiaokang commented 1 year ago

Hi, featureCounts was designed to quantify the gene expression level, and was not suitable for transcript quantification, as mentioned in the user guide here:

When assigning reads to genes or exons, most reads can be successfully assigned without
ambiguity. However if reads are to be assigned to transcripts, due to the high overlap between
transcripts from the same gene, many reads will be found to overlap more than one transcript
and therefore cannot be uniquely assigned. Specialized transcript-level quantification tools
are recommended for counting reads to transcripts. Such tools use model-based approaches
to deconvolve reads overlapping with multiple transcripts.

Therefore, if you want to have both genes and transcripts quantification, the recomended way is to go for the "transcriptome" quantification path, then ask for "gene-level" DEA, since tximport will then be applied there to generate the gene abundance which is well designed to work with salmon as stated here: Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

Hope this clears out your doubts.