xuranw / MuSiC

Multi-subject Single Cell Deconvolution
https://github.com/xuranw/MuSiC
GNU General Public License v3.0
224 stars 89 forks source link

Validation of cell type estimation with prior knowledge #58

Open Tushar-87 opened 3 years ago

Tushar-87 commented 3 years ago

I de-convoluted 12 bulk RNA seq samples of brains from mice subjected to stroke. Each brain was separated into two hemispheres (Ipsilateral where ischemia took place and we expect neuronal death and inflammation and contralateral where we do not expect any neuronal death/inflammation). The single cell data used for deconvolution consisted of four samples and 13 cell types. Endothelial cells (50%) and microglia (10-30%) were main cell types along with neurons, astrocytes, smooth muscle cells etc. music_prop function resulted in estimation of cell types from bulk data. Surprisingly endothelial and microglial cell proportions were 0 across all 12 bulk samples. This appears unlikely. Moreover, although neuronal cells came out as major constituents of bulk samples (80-90%), there was no decrease in ipsilateral neuronal proportion as compared with that of contralateral. Thus, cell type estimation completely fails in view of prior knowledge. What could have gone wrong here? I followed the tutorial. I created expression data sets from raw counts (not normalized). The code used for cell type estimation:

bulk.prop <- music_prop(bulk.eset = bulk.est, sc.eset = sc.est, markers = NULL, clusters = 'ident', samples = 'orig.ident', verbose = TRUE)

Also, benchmark evaluation also seems unsatisfactory: The % estimation varied from 0 to 280 %

tjbutler003 commented 3 years ago

Hi Tushar, I just wanted to note that I have had some issues with cell-types being estimated at 0%, which appear to be low but non-zero based on flow cytometry proportions of the same samples. Previously there was an issue with highly-correlated cell-types being estimated at 0%, but using the 2-step MuSiC algorithm was able to mainly fix this for me. However, there still remains one cell-type that is consistently estimated at 0% despite being its own cell group. I wonder if this might be connected? How correlated are the cell types that are being estimated at 0% with other cell types available?

tjbutler003 commented 3 years ago
image

Here is an example of my data set, showing how some groups of celltypes have up to 90% correlation between themselves - seemed to result in quite a few 0% estimates as it would appear to attribute some cell types to others that were highly correlated.

kangxige commented 3 years ago

hi ,i get the same trouble when using the "MuSiC" R package .Do you solve the problem? The number of scRNA-seq celltypes is 13,and  when using the MuSiC,TCGA bulk RNA-seq  can only get  10.The other 3 celltypes's  prop.weighted is 0. It is found that the markers genes of the three "0" celltypes is with very high expression. Thanks a lot.

Tushar-87 commented 3 years ago

@tjbutler003 @kangxige I could not solve this issue. It is still open.