zhxiaokang / RASflow

RNA-Seq analysis workflow
MIT License
105 stars 58 forks source link

raw counts #36

Closed MusculusMus closed 1 year ago

MusculusMus commented 1 year ago

I ran the RASflow in a docker container and got all the results. The RASflow is helpful to do RNAseq analysis automatically.

I need to find the raw counts in integer numbers. Nor gene_abundance.tsv nor gene_norm.tsv has integer numbers. After searching all the files in the output dir, in the subfolder, I found aux_info/ambig_info.tsv, which contains two columns: UniqueCount and AmbigCount. Are they the source of all the values in the tsv files? what's the actual meaning of UniqueCount and AmbigCount?

zhxiaokang commented 1 year ago

From the file names, I can see that you mapped the reads to transcriptome. In that pipeline, Salmon is used to estimate the abundance of transcripts and then tximport is used to obtain the gene abundance from its relevant transcripts. In that case, as Salmon puts on its website "Don't count . . . quantify!" it produces estimation instead of real count.

If you want to get the genes' read counts, I suggest to go for the other pipeline "genome" as shown in theworkflowchart

MusculusMus commented 1 year ago

Thanks for your reply. I will try the genome option. You can close the issue then.

zhxiaokang commented 1 year ago

Will close it for now. Don't hesitate to reopen it if you encounter further issues on the genome pipeline.