wwylab / DeMixT

GNU General Public License v3.0
32 stars 14 forks source link

Code runing stucked #20

Closed zhaoliang0302 closed 2 years ago

zhaoliang0302 commented 2 years ago

Hi, I followed the tutorial https://wwylab.github.io/DeMixT/tutorial.html. After running DeMixT for about 20min, the r console shows:

Break at 4
Objective function in each step: 
2401454.61395679  2352853.16143218  2352774.0343647  2352756.58892825

Initial of Proportions:
             PiN1
Sample 1   0.4746
Sample 2   0.8616
...
Sample 689 0.5714

and then stuck. The CPU usage dropped from 100% to 30%. I don't know if the program is still running. My PC is windows 10 64bit, with 6 cores 12 threads. Thanks!

jiyunmaths commented 2 years ago

Hi @zhaoliang0302 , sorry to hear that. Can you describe more about your problem? What are in the inputs and parameter settings for running DeMixT? I will try to help you fix the problem. Thanks.

zhaoliang0302 commented 2 years ago

The input file is the TCGA tumor transcriptome raw count matrix, with over 50000 genes and 800 samples. After the DeMixT_preprocessing process, the filtered data contains about 9000 genes. I followed the tutorial using the code:

nspikesin_list = c(0, 50, 100, 150)
ngene.selected_list = c(500, 1000, 1500, 2500)

for(nspikesin in nspikesin_list){
    for(ngene.selected in ngene.selected_list){
        name = paste("PRAD_demixt_GS_res_nspikesin", nspikesin, "ngene.selected", 
                      ngene.selected,  sep = "_");
        name = paste(name, ".RData", sep = "");
        res = DeMixT(data.Y = data.Y,
                     data.N1 = data.N1,
                     ngene.selected.for.pi = ngene.selected,
                     ngene.Profile.selected = ngene.selected,
                     filter.sd = 0.7, # same upper bound of gene expression standard deviation 
                     # for normal reference. i.e., preprocessed_data$sd_cutoff_normal[2]
                     gene.selection.method = "GS",
                     nspikein = nspikesin)
        save(res, file = name)
    }
}

It cost me about 2 hours and seems stuck. I think the pairwise loop costs too much time, then I changed the code as follows:

nspikesin = 100
ngene.selected = 1500

name = paste("PRAD_demixt_GS_res_nspikesin", nspikesin, "ngene.selected", 
                      ngene.selected,  sep = "_");
name = paste(name, ".RData", sep = "");
res = DeMixT(data.Y = data.Y,
                     data.N1 = data.N1,
                     ngene.selected.for.pi = ngene.selected,
                     ngene.Profile.selected = ngene.selected,
                     filter.sd = 0.7, # same upper bound of gene expression standard deviation 
                     # for normal reference. i.e., preprocessed_data$sd_cutoff_normal[2]
                     gene.selection.method = "GS",
                     nspikein = nspikesin)
        save(res, file = name)

Finally, it works. However, the output tumor-specific expression contains only ~7000 genes. The genes interested were missing in the final data. Is it the expected result?

dim(res$ExprT)
# 7318 889

Thanks!

jiyunmaths commented 2 years ago

@zhaoliang0302 Sorry for the delayed response. DeMixT filters out the genes with low difference of expression between tumor and normal components, as well as those having large expression variation in the tumor component. So output result with fewer number of genes is expected. Since your data has a large number of samples, you can increase nspikesin to 200, and run again. Thanks.