wwylab / DeMixT

GNU General Public License v3.0
32 stars 14 forks source link

unexpected results from simulated data #24

Closed arraytools closed 1 month ago

arraytools commented 1 year ago

I try to run DeMixT with simulated data with true tumor and normal profiles from TCGA but the estimated tumor purity is not what I expected. For example, if I created a data from a 2-component model with true tumor purity 0 or a very small number, the estimated tumor purity can be very large.

Then I try another data using the simulate_2comp() function included in DeMixT, the estimated tumor purities are not what they should be.

Case 1: both Y and N1 are from the normal samples. The true PiT is 0.

set.seed(1)
test.data = simulate_2comp(G = 500, My = 100, M1 = 100)

set.seed(1)
res <- DeMixT(data.Y = test.data$data.N1,
                 data.N1 = test.data$data.N1,
                 gene.selection.method = "GS", nthread = 64)
summary(t(res$pi))
 #      PiN1              PiT
 # Min.   :0.03023   Min.   :0.7668
 # 1st Qu.:0.09272   1st Qu.:0.8853
 # Median :0.09639   Median :0.9036
 # Mean   :0.10780   Mean   :0.8922
 # 3rd Qu.:0.11474   3rd Qu.:0.9073
 # Max.   :0.23322   Max.   :0.9698

Case 2: both Y and N1 are from the tumor samples. The true PiT is 0.

set.seed(1)
res <- DeMixT(data.Y = test.data$data.Y,
                 data.N1 = test.data$data.Y,
                 gene.selection.method = "GS", nthread = 64)
summary(t(res$pi))
 #      PiN1              PiT
 # Min.   :0.04195   Min.   :0.2704
 # 1st Qu.:0.07842   1st Qu.:0.6610
 # Median :0.23636   Median :0.7636
 # Mean   :0.23536   Mean   :0.7646
 # 3rd Qu.:0.33900   3rd Qu.:0.9216
 # Max.   :0.72957   Max.   :0.9580

Do I miss something or do any parameters need to be tweaked? Thanks.

jiyunmaths commented 1 year ago

@arraytools When running DeMixT, we require the gene expression profile of mixed tumor and normal samples to be different - in the preprocessing step, we use hierarchical clustering and PCA to visually check if there is a separation between mixed tumor and normal samples. If so, we remove those samples that do not separate well (please check the tutorial). So, the scenarios you are mentioning here do not exist in the real application of DeMixT. Thanks.

arraytools commented 1 year ago

Thank you for the explanation.

The scenario I was asking is an extreme case. The real case is when I mixed tumor and normal samples from TCGA BRCA data with 0.25 tumor proportion, I found DeMixT returns high estimates of the tumor proportion (median or mean is around 0.7). That is a quite surprise to me. Removing samples that do not separate well seems not practical since samples look mixed when I checked the dendrogram. I am thinking some tweaking on the parameters can help to improve the estimates. Do you have some suggestions?

Thanks again.

wwylab commented 1 month ago

This is an extreme case that is unique to the researchers' study design. We therefore would like to close the issue.