wwylab / DeMixT

GNU General Public License v3.0
32 stars 14 forks source link

gene expression input data for DeMixT #12

Closed liangdp1984 closed 3 years ago

liangdp1984 commented 4 years ago

Nice tools for cancer research community!!! For rnaseq data, raw count or normalized counts (such as CPM, FPKM, TPM) are used as input? For microarray data, log2 transformed or non-log linear intensity value are used as input? can you give us some suggestions?

pengyang0411 commented 4 years ago

Hi Liang, Thank you for showing interest in our package. For RNA-Seq data, we recommend users to use scale normalized read counts with both normal control and mixed tumor samples as input values. However, you can identify your own normalization method aiming to diminish the bach effect before starting deconvolution and if you use CPM or TPM, it should be converted back to raw count data. For microarray data, non-log linear intensity value is recommended. Please let me know if you have more questions. Best, Peng

liangdp1984 commented 3 years ago

Hi Liang, Thank you for showing interest in our package. For RNA-Seq data, we recommend users to use scale normalized read counts with both normal control and mixed tumor samples as input values. However, you can identify your own normalization method aiming to diminish the bach effect before starting deconvolution and if you use CPM or TPM, it should be converted back to raw count data. For microarray data, non-log linear intensity value is recommended. Please let me know if you have more questions. Best, Peng I have read your paper on comparing DemixT and ISOpure. In the method section, you generated a reference table from the human reference genome hg19 and then used the function findOverlaps to count the number of reads mapped to each exon for all the samples. This count dataset was pre-processed by total count normalization, and genes that contained zero counts were removed. The pre-processed count data were used as input for DeMixT and ISOpure. So I guess scale normalized read counts refers to Total count normalization: Gene counts are divided by the total number of mapped reads (or library size) multiplied by the mean total count across all the samples of the dataset.

pengyang0411 commented 3 years ago

Hi Dapeng,

We applied scale normalization at the seventy-fifth percentile based on the DSS package (Wu et al., 2013). To do so, we first combine normal samples and mixed tumor samples together, and by DSS package, we calculate a scale factor for each sample. Then, gene counts for each sample are divided by its scale factor to ensure the seventy-fifth percentile for each sample equal.

Best, Peng

liangdp1984 commented 3 years ago

Hi Dapeng,

We applied scale normalization at the seventy-fifth percentile based on the DSS package (Wu et al., 2013). To do so, we first combine normal samples and mixed tumor samples together, and by DSS package, we calculate a scale factor for each sample. Then, gene counts for each sample are divided by its scale factor to ensure the seventy-fifth percentile for each sample equal.

Best, Peng Upper Quartile normalization!! After deconvolution, how could I converted scale normalized counts in insilico dissected tumor component back to raw count? just multiply scale factor ?

wwylab commented 3 years ago

You don't need to convert it back. The dissected tumor expression is the relative expression values that are used for most downstream analyses.

liangdp1984 commented 3 years ago

You don't need to convert it back. The dissected tumor expression is the relative expression values that are used for most downstream analyses. what is the best choice for relative expression values for differential expression analysis? DEseq2 / edgers require raw counts and limma-voom requires TMM as input. wilcox.test is suitable for this situation? I usually use limma-voom to detect differentially expressed genes for large datasets, could I use TMM-based normalized counts as DeMixT input, then the dissected relative expression outputed by DeMixT are directly used as input for limma. Thanks!

ShaolongCao commented 3 years ago

The deconvolved tumor component can be considered as raw counts of tumor component. You just need to round the value of deconvolved matrix to the nearest integer. Then you can use DEseq2/edger or TMM to do DE analysis.

liangdp1984 commented 3 years ago

The deconvolved tumor component can be considered as raw counts of tumor component. You just need to round the value of deconvolved matrix to the nearest integer. Then you can use DEseq2/edger or TMM to do DE analysis. Thank you very much for the explanation!