labsyspharm / scimap

Spatial Single-Cell Analysis Toolkit
https://scimap.xyz/
MIT License
72 stars 24 forks source link

sm.tl.spatial_lda stores result in adata.uns not adata.obs #24

Closed marinabroz closed 2 years ago

marinabroz commented 2 years ago

Hello,

I am trying to run sm.tl.spatial_lda with my adata object but for some reason the spatial_lda results are stored in adata.uns and not adata.obs. This is causing some errors with the clustering downstream using sm.tl.cluster. Any help would be appreciated!

I am working in Python v3.9.

After running the code:

adata= sm.tl.spatial_lda(adata, x_coordinate='X', y_coordinate='Y', phenotype='celltype', method='radius', radius=30, knn=10, imageid='UniqueID', num_motifs=10, random_state=0, subset=None, label='spatial_lda')

adata

AnnData object with n_obs × n_vars = 79308 × 34 obs: 'Unnamed: 0', 'X', 'Y', 'Area', 'celltype', 'TLSType', 'UniqueID' uns: 'spatial_lda', 'spatial_lda_probability', 'spatial_lda_model'

Scimap

ajitjohnson commented 2 years ago

Hi @marinabroz thank you for bringing up this issue. I see that there is an issue with the documentation and a documentation page is missing.

The results of running the lda analysis are supposed to be saved in adata.uns, however subsequently you would cluster it with sm.tl.spatial_cluster function and not the regular sm.tl.cluster function. I see that this can be confusing and will try to converge into one function later on.

Please try the following and the documentation is up now at : https://scimap.xyz/All%20Functions/B.%20Tools/sm.tl.spatial_cluster/

adata = sm.tl.spatial_cluster (adata, df_name='spatial_lda', method = 'kmeans', k=15, label='spatial_lda_kmeans')

Please let me know if it works. Thank you.

marinabroz commented 2 years ago

Thank you for the quick reply! Using sm.tl.spatial_cluster worked for the clustering. Thanks again!

marinabroz commented 2 years ago

@ajitjohnson Does running sm.pl.cluster_plots (as below) generally take a long time? The kernel is running for about ~30 minutes and not returning any results. I only have ~80k cells in my dataset. Am I missing something? Thanks!

sm.pl.cluster_plots (adata, group_by='spatial_lda_kmeans')

ajitjohnson commented 2 years ago

umm I guess that depends on the machine. It should not take that long I guess maybe a few minutes but not 30 mins. can you try including the subsample=1000 parameter to see if the size is the issue? The cluster_plots is a wrapper and I do not generally use it as much. If you are familiar with scanpy you could directly use their umap and heatmap functions. The data is directly compatible and would not need any alterations.

check out the umap plots I generate here: https://scimap.xyz/tutorials/2-scimap-tutorial-cell-phenotyping/

marinabroz commented 2 years ago

Thanks, I think I can see why its not working. It seems that sm.tl.spatial_cluster returns NaN values for all of the cells. Have you encountered this before? When I look into the spatial_lda object there are many zero values for the weights, is this acceptable or could it be causing this issue?

Scimap2

ajitjohnson commented 2 years ago

hmm, that is difficult to say as those are latent variables and not immediately interpretable. What kind of data are you using? Have you tried varying the radius parameter? Depending on the pixel size, you might have to change this. An easier first step would be to use the knn method rather than the radius method.

In any case it does not explain why you would get Nan in your clustering result. Can you export adata.uns['spatial_lda'] and share it with me?

marinabroz commented 2 years ago

I am working with imaging mass cytometry data, and my tissues are tumor and immune cells. I have attached the data from uns[spatial_lda] here. For this run I have increased the radius to 100. Thank you!

spatial_lda.csv

ajitjohnson commented 2 years ago

Thank you. Just running your raw spatial_lda.csv through the functions works fine. Unfortunately, I would need your entire adata object to debug any further. If you are sending it over please save the h5ad file with everything that you ran right before the step where you run the spatial_cluster function (which ultimately gives you the error).

marinabroz commented 2 years ago

I appreciate your help! This is my first time working in python so I'm struggling a bit. I have attached the h5 file. I did recieve this error when attempting to adata.write , let me know if this is acceptable. Thank you!

NotImplementedError: Failed to write value for uns/spatial_lda_model, since a writer for type <class 'gensim.models.ldamulticore.LdaMulticore'> has not been implemented yet

adata.zip .

ajitjohnson commented 2 years ago

No problem. Hmm it works fine for me. Can you pip install scimap --upgrade to make sure you are on the latest version and try again?

Also, i would suggest grouping your cell-types into larger categories so that you have fewer cell-types to run through these algorithms. You have too many variables with limited field of view/ number of cells. It will be more interpretable if you can group your cell types into larger categories. Once you have identified the spatial domains, you can always look into it with finer cell type classification.

marinabroz commented 2 years ago

Hi Ajit, sorry for the late reply! Another member in my lab played around with it, and she was getting NaN at first as well. However, when she ran the following we were able to return cluster numbers:

adata=sm.tl.spatial_cluster(adata, random_state=0, df_name='spatial_lda')

Also I appreciate your input on the groupings of cells, we gave that a try as well and I'm pretty happy with the results. Thank you for all of your help!

ajitjohnson commented 2 years ago

No problem. Glad it worked out :)