randel / MIND

Using Bulk Gene Expression to Estimate Cell-Type-Specific Gene Expression via Deconvolution
https://randel.github.io/MIND/
43 stars 9 forks source link

Empty object returned by MIND::get_prior() #4

Closed dstueckm closed 3 years ago

dstueckm commented 3 years ago

Hi there,

I'm very interested in this method and managed to get the bMIND() method running using some TCGA bulk RNA-seq data and our own predicted cell type proportions based on our scRNA-seq data. The results didn't make too much sense, so I was hoping to improve the accuracy by supplying a scRNA-seq prior.

I couldn't find any examples in your tutorial/paper of how exactly the priors you used for the brain datasets were generated - if these exist, could you please point me to them? When I try to create a prior using the get_prior() function, the object returned contains only NULL values. I have put a reproducible example below using random data (2 samples, 4 cell types, 100 cells, 100 genes).

sc_matrix <- matrix(sample(1:10, 10000, T), ncol = 100, nrow = 100) colnames(sc_matrix) <- as.character(1:100) rownames(scmatrix) <- paste0("Gene", c(1:100)) sc_dataframe <- data.frame(sample = sample(1:2, 100, T), cell_type = rep(c("Endothelial", "Epithelial", "Myeloid", "Lymphoid"), 25)) prior <- MIND::get_prior(sc_matrix, sc_dataframe)

The "prior" object contains the expected cell type names, but rather than having numeric values for the profile and covariance slots it contains NULL. Am I doing something incorrectly?

Thank you, Daniel

randel commented 3 years ago

Hi Daniel,

Thanks for your interest in bMIND! The prior covariance matrix is estimated by aggregating scRNA-seq data to sample-level pseudo-bulk cell-type-specific (CTS) expression (gene x sample x cell type). For each gene, we have a matrix of sample x sample type and thus can calculate a covariance matrix of cell type by cell type. This requires a decent number of samples in scRNA-seq data. The code below may help us understand. https://github.com/randel/MIND/blob/master/R/bmind_func.r#L345

The reason is that 2 samples cannot produce a positive-definite covariance matrix. If you only have a limited number of samples, you may only use the prior mean cell-type-specific expression.

Best, Jiebiao

dstueckm commented 3 years ago

Thanks for the response! I tried reducing the number of cell types and using all samples and it seems to work.

linanzhang commented 2 years ago

Hello, I couldn't get non-empty prior given the ref_meta I have. The ref data has 9370 genes and 7796 cells. What could go wrong? Thanks.

Screen Shot 2021-11-14 at 10 00 08 PM

randel commented 2 years ago

This is because the CTS covariance matrix is non-positive-definite. You may change the following line

gene_pd = apply(cov, 1, is.positive.definite)

to

gene_pd = 1:nrow(cov)

I will upload a revised version soon.