wwylab / DeMixT

GNU General Public License v3.0
32 stars 14 forks source link

What is the input from RNAseq data? Is the output ready for downstream statistical analysis? #18

Closed guodudou closed 2 years ago

guodudou commented 2 years ago

Hello,

I am interested in using DeMixT to perform cancer profile deconvolution of our RNAseq data. I have RSEM estimated read counts, what else should I do to the counts to be able to input them to DeMixT? Is the output ready for downstream statistical analysis or I have to perform additional normalization or transformation?

In addition, for the normal references, if our samples are Metastatic castration-resistant prostate cancer, should I still use normal prostate tissue as references? Can I start with read counts from GTEx or GDC or it is better to process from raw sequencing data to eliminate potential batch effect?

Thank you very much! I look forward to hearing from you.

Best, Wendy

jiyunmaths commented 2 years ago

@guodudou Thanks for using DeMixT in your project. The inputs of DeMixT are the raw read counts from normal (non-tumor) and tumor samples. The normal samples are used as the reference. Ideally, the normal sample should come from the same organ of the same patient as the tumor sample. If such data are not available, you can also use the normal samples from TCGA or GTEx of the same cancer type to deconvolve the primary tumor samples.

The pipeline of DeMixT deconvolution involves the following steps:

  1. obtain raw read counts from RNAseq data (instead of using RSEM read counts, we suggest to follow GDC mRNA analysis pipeline to obtain raw read counts https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/).
  2. remove suspicious normal samples that show similar expression profile with tumor samples using hierarchical clustering.
  3. normalize the expressions from both tumor and normal samples using scale normalization. If the tumor samples from different datasets with strong batch effects, we recommend to use combat to correct the batch effects among tumor samples before applying scale normalization. If the normal samples from TCGA or GTEx, but the tumor samples from your own study, we do not recommend do batch effect correct between normal and tumor samples, since it will over-correct the signals between them.

After the above steps, the read counts from tumor and normal samples are fed into DeMixT as data.Y and data.N1, respectively. DeMixT returns a list object and let's call it res. res$pi are the component proportions. res$ExprT are the tumor specific expressions and res$ExprN1 are the normal specific expressions.

DeMixT can be run on metastatic tumor samples when a proper reference profile is available. But it is challenging to obtain such reference profile. We do not recommend to use the normal prostate tumor sample to deconvolve the metastatic castration-resistant prostate cancer samples.

More details about DeMixT can be found in the Bioconductor Vignette https://www.bioconductor.org/packages/release/bioc/vignettes/DeMixT/inst/doc/demixt.html. We will also upload a comprehensive tutorial about DeMixT to the github repository (https://github.com/wwylab/DeMixT) in the coming days and will let you know then. You can follow it for your own project.

guodudou commented 2 years ago

@jiyunmaths Thank you so much for such detailed and constructive suggestions. It is an unfortunate for me to not be able to apply DeMixT to mCRPC samples. But this totally makes sense when think about the logic behind DeMixT deconvolution. Look forward to the comprehensive tutorial and utilizing DeMixT to primary tumor samples soon!