Problem using sq.pl.co_occurrence on own data

ofrag commented 9 months ago

Hi,

I'm trying to use SquidPy to analyze digital slide data, as a second step for cell segmentation and clustering with QuPath and Cellpose. I'm exporting the measurement table as csv file and import it into AnnData I successfully use

sq.gr.spatial_neighbors followed by sq.pl.spatial_scatter to present the sub-graph associated with a given cell,
sq.gr.spatial_neighbors followed by sq.gr.nhood_enrichment and sq.pl.nhood_enrichment
sq.gr.interaction_matrix followed by sq.pl.interaction_matrix
sq.pl.spatial_scatter to show the cell classes as an image

However, when I try to run sq.gr.co_occurrence followed by sq.pl.co_occurrence using the code below, I get an error. Any hint on how to solve this would be appreciated. also: is there another way to check if the output of sq.gr.co_occurrenceis OK

below is the code I use in my notebook, followed by the error message I get.

The problem seems to me to be connected to the size of the image, because when I upload the QuPath results of a cropped version of the image the same code runs without any problem.

When I run it on the bigger image , when running sq.gr.co_occurrence, I get the following message: WARNING:n_splitswas automatically set to48to prevent97173x97173distance matrix from being created but the function complete running all the way to 100% .

Thanks Ofra

sq.gr.co_occurrence(det_adata, cluster_key="Class", n_jobs =20)
sq.pl.co_occurrence(
    det_adata,
    cluster_key="Class",
    clusters=["CD3","CD20","TReg"],
    figsize=(15, 4),    
)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File D:\Users\ofrag\.conda\envs\tangram\lib\site-packages\pandas\core\indexes\base.py:3653, in Index.get_loc(self, key)
   3652 try:
-> 3653     return self._engine.get_loc(casted_key)
   3654 except KeyError as err:

File D:\Users\ofrag\.conda\envs\tangram\lib\site-packages\pandas\_libs\index.pyx:147, in pandas._libs.index.IndexEngine.get_loc()

File D:\Users\ofrag\.conda\envs\tangram\lib\site-packages\pandas\_libs\index.pyx:176, in pandas._libs.index.IndexEngine.get_loc()

File pandas\_libs\hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas\_libs\hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'y'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[43], line 1
----> 1 sq.pl.co_occurrence(
      2     det_adata,
      3     cluster_key="Class",
      4     clusters=["CD3","CD20","TReg"],
      5     figsize=(15, 4),    
      6 )

File D:\Users\ofrag\.conda\envs\tangram\lib\site-packages\squidpy\pl\_graph.py:381, in co_occurrence(adata, cluster_key, palette, clusters, figsize, dpi, save, legend_kwargs, **kwargs)
    378 df = pd.DataFrame(out[idx, :, :].T, columns=categories).melt(var_name=cluster_key, value_name="probability")
    379 df["distance"] = np.tile(interval, len(categories))
--> 381 sns.lineplot(
    382     x="distance",
    383     y="probability",
    384     data=df,
    385     dashes=False,
    386     hue=cluster_key,
    387     hue_order=categories,
    388     palette=palette,
    389     ax=ax,
    390     **kwargs,
    391 )
    392 ax.legend(**legend_kwargs)
    393 ax.set_title(rf"$\frac{{p(exp|{g})}}{{p(exp)}}$")

File D:\Users\ofrag\.conda\envs\tangram\lib\site-packages\seaborn\relational.py:645, in lineplot(data, x, y, hue, size, style, units, palette, hue_order, hue_norm, sizes, size_order, size_norm, dashes, markers, style_order, estimator, errorbar, n_boot, seed, orient, sort, err_style, err_kws, legend, ci, ax, **kwargs)
    642 color = kwargs.pop("color", kwargs.pop("c", None))
    643 kwargs["color"] = _default_color(ax.plot, hue, color, kwargs)
--> 645 p.plot(ax, kwargs)
    646 return ax

File D:\Users\ofrag\.conda\envs\tangram\lib\site-packages\seaborn\relational.py:459, in _LinePlotter.plot(self, ax, kws)
    457         lines.extend(ax.plot(unit_data["x"], unit_data["y"], **kws))
    458 else:
--> 459     lines = ax.plot(sub_data["x"], sub_data["y"], **kws)
    461 for line in lines:
    463     if "hue" in sub_vars:

File D:\Users\ofrag\.conda\envs\tangram\lib\site-packages\pandas\core\frame.py:3761, in DataFrame.__getitem__(self, key)
   3759 if self.columns.nlevels > 1:
   3760     return self._getitem_multilevel(key)
-> 3761 indexer = self.columns.get_loc(key)
   3762 if is_integer(indexer):
   3763     indexer = [indexer]

File D:\Users\ofrag\.conda\envs\tangram\lib\site-packages\pandas\core\indexes\base.py:3655, in Index.get_loc(self, key)
   3653     return self._engine.get_loc(casted_key)
   3654 except KeyError as err:
-> 3655     raise KeyError(key) from err
   3656 except TypeError:
   3657     # If we have a listlike key, _check_indexing_error will raise
   3658     #  InvalidIndexError. Otherwise we fall through and re-raise
   3659     #  the TypeError.
   3660     self._check_indexing_error(key)

KeyError: 'y'

giovp commented 9 months ago

hi @ofrag have you tried with a different version of seaborn?

zsfrbkv commented 7 months ago

having the same issue and running on seaborn 0.12.2.

giovp commented 7 months ago

hi @zsfrbkv , what about pandas?

zsfrbkv commented 7 months ago

@giovp hi! pandas is 2.0.3

giovp commented 7 months ago

can you upgrade to "pandas>=2.1.0", that's what in the requirements of squidpy

zsfrbkv commented 7 months ago

tried with pandas-2.1.3 -- got another error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[99], line 2
      1 for ct in adata.obs['cell_type_level_0'].unique():
----> 2     sq.pl.co_occurrence(
      3         adata_subsample,
      4         cluster_key="cell_type_level_0",
      5         clusters=ct,
      6         figsize=(10, 5),
      7     )
      9 for ct in adata.obs['cell_type_level_1'].unique():
     10     sq.pl.co_occurrence(
     11         adata_subsample,
     12         cluster_key="cell_type_level_1",
     13         clusters=ct,
     14         figsize=(10, 5),
     15     )

File miniconda3/envs/xenium/lib/python3.9/site-packages/squidpy/pl/_graph.py:378, in co_occurrence(adata, cluster_key, palette, clusters, figsize, dpi, save, legend_kwargs, **kwargs)
    376 for g, ax in zip(clusters, axs):
    377     idx = np.where(categories == g)[0][0]
--> 378     df = pd.DataFrame(out[idx, :, :].T, columns=categories).melt(var_name=cluster_key, value_name="probability")
    379     df["distance"] = np.tile(interval, len(categories))
    381     sns.lineplot(
    382         x="distance",
    383         y="probability",
   (...)
    390         **kwargs,
    391     )

File miniconda3/envs/xenium/lib/python3.9/site-packages/pandas/core/frame.py:782, in DataFrame.__init__(self, data, index, columns, dtype, copy)
    771         mgr = dict_to_mgr(
    772             # error: Item "ndarray" of "Union[ndarray, Series, Index]" has no
    773             # attribute "name"
   (...)
    779             copy=_copy,
    780         )
    781     else:
--> 782         mgr = ndarray_to_mgr(
    783             data,
    784             index,
    785             columns,
    786             dtype=dtype,
    787             copy=copy,
    788             typ=manager,
    789         )
    791 # For data is list-like, or Iterable (will consume into list)
    792 elif is_list_like(data):

File miniconda3/envs/xenium/lib/python3.9/site-packages/pandas/core/internals/construction.py:332, in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
    323     values = sanitize_array(
    324         values,
    325         None,
   (...)
    328         allow_2d=True,
    329     )
    331 # _prep_ndarraylike ensures that values.ndim == 2 at this point
--> 332 index, columns = _get_axes(
    333     values.shape[0], values.shape[1], index=index, columns=columns
    334 )
    336 _check_values_indices_shape_match(values, index, columns)
    338 if typ == "array":

File miniconda3/envs/xenium/lib/python3.9/site-packages/pandas/core/internals/construction.py:756, in _get_axes(N, K, index, columns)
    754     columns = default_index(K)
    755 else:
--> 756     columns = ensure_index(columns)
    757 return index, columns

File miniconda3/envs/xenium/lib/python3.9/site-packages/pandas/core/indexes/base.py:7569, in ensure_index(index_like, copy)
   7567         return Index(index_like, copy=copy, tupleize_cols=False)
   7568 else:
-> 7569     return Index(index_like, copy=copy)

File miniconda3/envs/xenium/lib/python3.9/site-packages/pandas/core/indexes/base.py:505, in Index.__new__(cls, data, dtype, copy, name, tupleize_cols)
    502         return result.astype(dtype, copy=False)
    503     return result
--> 505 elif is_ea_or_datetimelike_dtype(dtype):
    506     # non-EA dtype indexes have special casting logic, so we punt here
    507     pass
    509 elif is_ea_or_datetimelike_dtype(data_dtype):

File miniconda3/envs/xenium/lib/python3.9/site-packages/pandas/core/dtypes/common.py:1330, in is_ea_or_datetimelike_dtype(dtype)
   1322 def is_ea_or_datetimelike_dtype(dtype: DtypeObj | None) -> bool:
   1323     """
   1324     Check for ExtensionDtype, datetime64 dtype, or timedelta64 dtype.
   1325 
   (...)
   1328     Checks only for dtype objects, not dtype-castable strings or types.
   1329     """
-> 1330     return isinstance(dtype, ExtensionDtype) or (lib.is_np_dtype(dtype, "mM"))

AttributeError: module 'pandas._libs.lib' has no attribute 'is_np_dtype'

zsfrbkv commented 7 months ago

downgraded to pandas=2.1.0 -- the issue resolved itself. thanks for the help!

scverse / squidpy

Problem using sq.pl.co_occurrence on own data #753