plevritis-lab / CELESTA

Automate unsupervised machine learning cell type identification using both protein expressions and cell spatial neighborhood information for multiplexed in situ imaging data. No training dataset with cell type labels is required.
Apache License 2.0
29 stars 10 forks source link

Error with CreateCelestaObject #19

Open HMgalloway opened 1 year ago

HMgalloway commented 1 year ago

Hi, I am currently experiencing an error with the CreateCelestaObject step. When I run CreateCelestaObject() I get warnings that my marker expression is potentially too sparse:

Marker: Cd19 [1] "Warning: The marker expression potentially has too many zeros for \n fitting. GMM fitting will use input expression data with reduced \n sparsity"

After that I get the following error:

Error in xxx@results[[1]] : subscript out of bounds In addition: Warning message: In mixmodCluster(marker_exp, 2, models = mixmodGaussianModel(family = "general", : All models got errors!

Furthermore, when I remove the markers that I get warnings on, more warnings appear for different markers. I am working with a spatially resolved single-cell RNA-seq dataset with normalized marker expression.

Naively, I just set all my cell types to be assigned in the first round just to run through the approach once, but to the best of my knowledge the formatting for the signature matrix and for the imaging data matrix are correct. I have attached them so this can be verified.

Any help is greatly appreciated!

celesta_signature_file_naive.csv celesta_norm_expression.csv

weiruo16 commented 1 year ago

I am very sorry for my late responses. I hope that it can still be helpful in someway. It seems that the Cd19 expressions are indeed very sparse, of which majority are zeros. That is where the warning came from. It may reduce some accuracy with such a sparsity. I think the error came from the normalized expressions. CELESTA has its own normalization step with the GMM fitting, and therefore, the input expressions should come from the segmentation without any normalization. Other normalization methods may have disrupted the GMM assumptions. I would recommend to try running it without normalizing the expressions (no log transformation, scaling etc).