waldronlab / curatedTCGAData

Curated Data From The Cancer Genome Atlas (TCGA) as MultiAssayExperiment Objects
https://bioconductor.org/packages/curatedTCGAData
41 stars 7 forks source link

Normalization applied in RNASeq2GeneNorm #50

Closed mlmms closed 2 years ago

mlmms commented 2 years ago

Hello there! What is the normalization applied to gene expression values when retrieving the "RNASeq2GeneNorm" data instead of "RNASeqGene" assay? I cannot find any other description online other than "Upper quartile normalized RSEM TPM gene expression values", but it is not specific enough. Thank you

lwaldron commented 2 years ago

Hi @mlmms, these data were processed by the Broad GDAC Firehose pipeline which we have nothing to do with, so I'll have to refer you to their documentation, which seems now to exist only in original TCGA publications and on the Wayback machine (https://web.archive.org/web/20161226221728/https://confluence.broadinstitute.org/display/GDAC/Documentation). I seem to remember that RNASeq2GeneNorm used RSEM:

Li B, Dewey CN: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 2011, 12:323.

Please let us know if you can find a current canonical reference for GDAC Firehose pipelines!

LiNk-NY commented 2 years ago

https://broadinstitute.atlassian.net/wiki/spaces/GDAC/pages/844334346/Documentation#Documentation-RNAseqPipelines