labsyspharm / scimap

Spatial Single-Cell Analysis Toolkit
https://scimap.xyz/
MIT License
74 stars 26 forks source link

spatial_interaction - issues with rare cell type handling #117

Open gesavoigt opened 4 weeks ago

gesavoigt commented 4 weeks ago

Hi,

I am using scimap 2.1.3, and noticed that an error is being thrown when using spatial_interaction with a rare cell phenotype.

I have constructed an AnnData object with multiple ROIs, one of which only contained one cell of a certain cell phenotype ('C' here). This throws an error operands could not be broadcast together with shapes (9,) (6,). Here is a minimal example:

data = pd.DataFrame({'X_centroid': np.random.normal(size=10)**2,
                     'Y_centroid': np.random.normal(size=10)**2,
                     'phenotype': ['A','B','A','B','A','B','A','B','A','C'],
                     'imageid': 'imageid'})
adata = ad.AnnData(obs=data)
sm.tl.spatial_interaction(adata, method='knn', knn=2, pval_method='zscore')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[129], [line 7](vscode-notebook-cell:?execution_count=129&line=7)
      [2](vscode-notebook-cell:?execution_count=129&line=2) data = pd.DataFrame({'X_centroid': np.random.normal(size=10)**2,
      [3](vscode-notebook-cell:?execution_count=129&line=3)                      'Y_centroid': np.random.normal(size=10)**2,
      [4](vscode-notebook-cell:?execution_count=129&line=4)                      'phenotype': ['A','B','A','B','A','B','A','B','A','C'],
      [5](vscode-notebook-cell:?execution_count=129&line=5)                      'imageid': 'imageid'})
      [6](vscode-notebook-cell:?execution_count=129&line=6) adata = ad.AnnData(obs=data)
----> [7](vscode-notebook-cell:?execution_count=129&line=7) sm.tl.spatial_interaction(adata, method='knn', knn=2, pval_method='zscore')

File ~/.py_venv/scimap/lib/python3.10/site-packages/scimap/tools/spatial_interaction.py:250, in spatial_interaction(adata, x_coordinate, y_coordinate, z_coordinate, phenotype, method, radius, knn, permutation, imageid, subset, pval_method, verbose, label)
    [246](https://file+.vscode-resource.vscode-cdn.net/home/gesa/Documents/Master/Schapiro/NBH/master_thesis/scripts/nbh_analysis/~/.py_venv/scimap/lib/python3.10/site-packages/scimap/tools/spatial_interaction.py:246) # Apply function to all images and create a master dataframe
    [247](https://file+.vscode-resource.vscode-cdn.net/home/gesa/Documents/Master/Schapiro/NBH/master_thesis/scripts/nbh_analysis/~/.py_venv/scimap/lib/python3.10/site-packages/scimap/tools/spatial_interaction.py:247) # Create lamda function 
    [248](https://file+.vscode-resource.vscode-cdn.net/home/gesa/Documents/Master/Schapiro/NBH/master_thesis/scripts/nbh_analysis/~/.py_venv/scimap/lib/python3.10/site-packages/scimap/tools/spatial_interaction.py:248) r_spatial_interaction_internal = lambda x: spatial_interaction_internal (adata_subset=x, x_coordinate=x_coordinate, y_coordinate=y_coordinate, 
    [249](https://file+.vscode-resource.vscode-cdn.net/home/gesa/Documents/Master/Schapiro/NBH/master_thesis/scripts/nbh_analysis/~/.py_venv/scimap/lib/python3.10/site-packages/scimap/tools/spatial_interaction.py:249)                                                                          z_coordinate=z_coordinate, phenotype=phenotype, method=method,  radius=radius, knn=knn, permutation=permutation, imageid=imageid,subset=subset,pval_method=pval_method) 
--> [250](https://file+.vscode-resource.vscode-cdn.net/home/gesa/Documents/Master/Schapiro/NBH/master_thesis/scripts/nbh_analysis/~/.py_venv/scimap/lib/python3.10/site-packages/scimap/tools/spatial_interaction.py:250) all_data = list(map(r_spatial_interaction_internal, adata_list)) # Apply function 
    [253](https://file+.vscode-resource.vscode-cdn.net/home/gesa/Documents/Master/Schapiro/NBH/master_thesis/scripts/nbh_analysis/~/.py_venv/scimap/lib/python3.10/site-packages/scimap/tools/spatial_interaction.py:253) # Merge all the results into a single dataframe    
    [254](https://file+.vscode-resource.vscode-cdn.net/home/gesa/Documents/Master/Schapiro/NBH/master_thesis/scripts/nbh_analysis/~/.py_venv/scimap/lib/python3.10/site-packages/scimap/tools/spatial_interaction.py:254) df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['phenotype', 'neighbour_phenotype'], how='outer'), all_data)

File ~/.py_venv/scimap/lib/python3.10/site-packages/scimap/tools/spatial_interaction.py:248, in spatial_interaction.<locals>.<lambda>(x)
    [243](https://file+.vscode-resource.vscode-cdn.net/home/gesa/Documents/Master/Schapiro/NBH/master_thesis/scripts/nbh_analysis/~/.py_venv/scimap/lib/python3.10/site-packages/scimap/tools/spatial_interaction.py:243)     adata_list = [adata[adata.obs[imageid] == i] for i in adata.obs[imageid].unique()]
    [246](https://file+.vscode-resource.vscode-cdn.net/home/gesa/Documents/Master/Schapiro/NBH/master_thesis/scripts/nbh_analysis/~/.py_venv/scimap/lib/python3.10/site-packages/scimap/tools/spatial_interaction.py:246) # Apply function to all images and create a master dataframe
    [247](https://file+.vscode-resource.vscode-cdn.net/home/gesa/Documents/Master/Schapiro/NBH/master_thesis/scripts/nbh_analysis/~/.py_venv/scimap/lib/python3.10/site-packages/scimap/tools/spatial_interaction.py:247) # Create lamda function 
--> [248](https://file+.vscode-resource.vscode-cdn.net/home/gesa/Documents/Master/Schapiro/NBH/master_thesis/scripts/nbh_analysis/~/.py_venv/scimap/lib/python3.10/site-packages/scimap/tools/spatial_interaction.py:248) r_spatial_interaction_internal = lambda x: spatial_interaction_internal (adata_subset=x, x_coordinate=x_coordinate, y_coordinate=y_coordinate, 
    [249](https://file+.vscode-resource.vscode-cdn.net/home/gesa/Documents/Master/Schapiro/NBH/master_thesis/scripts/nbh_analysis/~/.py_venv/scimap/lib/python3.10/site-packages/scimap/tools/spatial_interaction.py:249)                                                                          z_coordinate=z_coordinate, phenotype=phenotype, method=method,  radius=radius, knn=knn, permutation=permutation, imageid=imageid,subset=subset,pval_method=pval_method) 
...
File ~/.py_venv/scimap/lib/python3.10/site-packages/pandas/core/roperator.py:19, in rmul(left, right)
     [18](https://file+.vscode-resource.vscode-cdn.net/home/gesa/Documents/Master/Schapiro/NBH/master_thesis/scripts/nbh_analysis/~/.py_venv/scimap/lib/python3.10/site-packages/pandas/core/roperator.py:18) def rmul(left, right):
---> [19](https://file+.vscode-resource.vscode-cdn.net/home/gesa/Documents/Master/Schapiro/NBH/master_thesis/scripts/nbh_analysis/~/.py_venv/scimap/lib/python3.10/site-packages/pandas/core/roperator.py:19)     return right * left

ValueError: operands could not be broadcast together with shapes (9,) (6,)

I suspect the error arises like this: Missing interactions between phenotypes are accounted for for the object k via columns_to_add, but not in the object n_freq, which does not contain the single-cell phenotype.

I could avoid the error by excluding the cell type from this imageid entirely. Of course, removing the cell changes the number of neighbors of surrounding cells etc, so this is not an ideal solution.

Another note: In cases like this, where columns_to_add is not empty, the 'phenotype' column needs to be of type string explicitly. Integer values in that column will throw an error for line k = k.assign(**columns_to_add): TypeError: keywords must be strings. A workaround is to append a string character to every phenotype that could be coerced to int.