wwylab / DeMixT

GNU General Public License v3.0
32 stars 14 forks source link

Getting error: `DeMixT Estimation Failed, try different initial values for pi` #26

Closed ncyue2 closed 11 months ago

ncyue2 commented 11 months ago

Hi, I am following closely the tutorial on https://wwylab.github.io/DeMixT/tutorial.html, and the codes run perfectly fine.

But when i change the data to my own data, I received the error:

Error in DeMixT_GS(data.Y = data.Y, data.N1 = data.N1, data.N2 = data.N2, : 
DeMixT Estimation Failed, try different initial values for pi

when i try to execute the line:

res <- DeMixT(data_Y, data_N1, data_N2, niter = 5,
  nthread = 50,  if.filter = TRUE
)

and I found out that this res <- DeMixT(data_Y, data_N1, data_N2, niter = 5, nthread = 50, if.filter = TRUE) only cause the error whenniter is set to 5 or above.

I have tried to set filter.sd = 5, ngene.selected.for.pi = 0, or ngene.selected.for.pi = 50, but same error occurs. I am wondering what is causing the problem and any possible solution to the problems. Thank you.

Below are my complete codes:

library(io)
library(DeMixT)

norm_counts <- qread("./normalized_counts.rds");
pseudo_bulk_lst <- qread("./pseudo_bulk_lst.rds");
pseudo_bulk_names <- names(pseudo_bulk_lst);

## match the gene symbols of the rna-seq and pseudo_bulk norm_counts
common_gene_symbols <- intersect(rownames(norm_counts), names(pseudo_bulk_lst[[1]])) # values are randomly chosen
norm_counts <- norm_counts[match(common_gene_symbols, rownames(norm_counts)), ]
pseudo_bulk_lst <- lapply(pseudo_bulk_lst, function(x){
  x <- x[match(common_gene_symbols, names(x))]
})
names(pseudo_bulk_lst) <- pseudo_bulk_names;

# combine the data
pseudo_bulk_stromal <- data.frame(pseudo_bulk_lst[grep("stromal", names(pseudo_bulk_lst))]);
pseudo_bulk_immune <- data.frame(pseudo_bulk_lst[grep("immune", names(pseudo_bulk_lst))]); 

## extract tcga data
norm_counts <- norm_counts[, grepl("tcga", colnames(norm_counts))];

## combine
norm_counts_combined <- data.frame(norm_counts, pseudo_bulk_stromal, pseudo_bulk_immune);
colnames(norm_counts_combined) <- gsub("\\.", "-", colnames(norm_counts_combined))

# select genes with small variation in gene expression across samples
tumor_ids <- colnames(norm_counts);
normal_ids <- c(colnames(pseudo_bulk_stromal), colnames(pseudo_bulk_immune));

num_gene_remaining_diff_cutoffs <- 
  subset_sd_gene_remaining(norm_counts_combined, normal_ids, tumor_ids,
    cutoff_normal_range = c(0.1, 3), cutoff_tumor_range = c(0, 3),
    cutoff_step = 0.1)

# find left over genes > 9000 (as a starting point)
num_gene_cutoff <- 9000;
num_gene_remaining_sub <- 
  num_gene_remaining_diff_cutoffs[num_gene_remaining_diff_cutoffs$num.gene.remaining > num_gene_cutoff, ]

cutoff_normal_range <- c(0.1, 1.7);
cutoff_tumor_range <- c(0, 2.1);

norm_counts_sub <- subset_sd(norm_counts_combined, normal_ids, tumor_ids, 
  cutoff_normal = cutoff_normal_range, cutoff_tumor = cutoff_tumor_range);

# Scale normalization at 75th percentile
norm_counts_sub <- scale_normalization_75th_percentile(norm_counts_sub);

## Deconvolution with demixt
set.seed(123)
data_Y <- 
  SummarizedExperiment(assays = list(counts = norm_counts_sub[, 1:ncol(norm_counts)]))
N1_start_col <- ncol(norm_counts) + 1 # 230
N1_end_col <- N1_start_col + ncol(pseudo_bulk_stromal) - 1 # 240
data_N1 <- 
  SummarizedExperiment(assays = list(counts = norm_counts_sub[, N1_start_col:N1_end_col]))
N2_start_col <- N1_end_col + 1 # 241
N2_end_col <- N2_start_col + ncol(pseudo_bulk_immune) - 1 # 251
data_N2 <- 
  SummarizedExperiment(assays = list(counts = norm_counts_sub[, N2_start_col:N2_end_col]))

res <- DeMixT(data_Y, data_N1, data_N2, niter = 5,
  nthread = 50,  if.filter = TRUE
)

#error occurs: Error in DeMixT_GS(data.Y = data.Y, data.N1 = data.N1, data.N2 = data.N2, : 
# DeMixT Estimation Failed, try different initial values for pi

Other information: Operating system: Fedora Linux version: 6.4.12-200.fc38.x86_64 R version 4.3.1 (2023-06-16) Platform: x86_64-conda-linux-gnu (64-bit) other attached packages: DeMixT_1.16.0

Thank you.

jiyunmaths commented 11 months ago

Hi @ncyue2, sorry for the error. I am trying to fix it. Can I first make sure your inputs are raw counts (e.g., generated by htseq-count)? Your input file names may indicate they are normalized. DeMixT requires inputs are raw counts. Thanks.

ncyue2 commented 11 months ago

Hi @jiyunmaths, thank you for the quick reply! Unfortunately I do not have the raw counts, all I have access to is the tpm rate; However, I converted the tpm rate to counts by reverse operation counts <- tpm*2600*10^{-6} where 2600 is the average gene length; similar to the solution in this post So my input data should be "imitating" the raw counts.

Extra: I estimates the proportion of stromal and immune components with scRNA-seq to be pi01 = 0.16 and pi02 = 0.63 respectively, but the same error occurs.

Thank you.

jiyunmaths commented 11 months ago

Hi @ncyue2, sorry I do not think it works to convert TPM to raw counts, since we lost the raw expression scale of each sample. Can I recommend to check if you can get access to the fastq or BAM file? If so, it will be much easier and more accurate to estimate the raw counts. Thanks.

ncyue2 commented 11 months ago

Hi @jiyunmaths, that's okay! I will see if I can get access to the fastq or BAM file! Thank you!