saezlab / liana-py

LIANA+: an all-in-one framework for cell-cell communication
http://liana-py.readthedocs.io/
GNU General Public License v3.0
134 stars 15 forks source link

KeyError: None when using top_n parameter in li.pl.dotplot #109

Closed WeipengMO closed 1 month ago

WeipengMO commented 1 month ago

Describe the bug

When I use the top_n parameter in li.pl.dotplot, I encounter a KeyError: None error. Specifically, I'm using the following code:

adata = sc.datasets.pbmc68k_reduced()

# run cellphonedb
cellphonedb(adata, groupby='bulk_labels', expr_prop=0.1, resource_name='consensus', verbose=True, key_added='cpdb_res')

li.pl.dotplot(adata = adata,
              colour='lr_means',
              size='cellphone_pvals',
              inverse_size=True, # we inverse sign since we want small p-values to have large sizes
              source_labels=['CD34+', 'CD56+ NK', 'CD14+ Monocyte'],
              target_labels=['CD34+', 'CD56+ NK'],
              figure_size=(8, 7),
              top_n=5,
              # finally, since cpdbv2 suggests using a filter to FPs
              # we filter the pvals column to <= 0.05
              filter_fun=lambda x: x['cellphone_pvals'] <= 0.05,
              uns_key='cpdb_res' # uns_key to use, default is 'liana_res'
             )

And it raised such error info:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_loc(self, key)
   3652         try:
-> 3653             return self._engine.get_loc(casted_key)
   3654         except KeyError as err:

7 frames
/usr/local/lib/python3.10/dist-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/usr/local/lib/python3.10/dist-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: None

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
[<ipython-input-14-d88f7a422161>](https://localhost:8080/#) in <cell line: 1>()
----> 1 li.pl.dotplot(adata = adata,
      2               colour='lr_means',
      3               size='cellphone_pvals',
      4               inverse_size=True, # we inverse sign since we want small p-values to have large sizes
      5               source_labels=['CD34+', 'CD56+ NK', 'CD14+ Monocyte'],

[/usr/local/lib/python3.10/dist-packages/liana/plotting/_dotplot.py](https://localhost:8080/#) in dotplot(adata, uns_key, liana_res, colour, size, source_labels, target_labels, top_n, orderby, orderby_ascending, orderby_absolute, filter_fun, ligand_complex, receptor_complex, inverse_colour, inverse_size, cmap, size_range, figure_size, return_fig)
     76 
     77     liana_res = _filter_by(liana_res, filter_fun)
---> 78     liana_res = _get_top_n(liana_res, top_n, orderby, orderby_ascending, orderby_absolute)
     79 
     80     # inverse sc if needed

[/usr/local/lib/python3.10/dist-packages/liana/plotting/_common.py](https://localhost:8080/#) in _get_top_n(liana_res, top_n, orderby, orderby_ascending, orderby_absolute)
     89             how = 'max'
     90 
---> 91         top_lrs = _aggregate_scores(liana_res,
     92                                     what=orderby,
     93                                     how=how,

[/usr/local/lib/python3.10/dist-packages/liana/plotting/_common.py](https://localhost:8080/#) in _aggregate_scores(res, what, how, absolute, entities)
     58 
     59 def _aggregate_scores(res, what, how, absolute, entities):
---> 60     res['score'] = np.absolute(res[what]) if absolute else res[what]
     61     res = res.groupby(entities).agg(score=('score', how)).reset_index()
     62     return res

[/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py](https://localhost:8080/#) in __getitem__(self, key)
   3759             if self.columns.nlevels > 1:
   3760                 return self._getitem_multilevel(key)
-> 3761             indexer = self.columns.get_loc(key)
   3762             if is_integer(indexer):
   3763                 indexer = [indexer]

[/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_loc(self, key)
   3653             return self._engine.get_loc(casted_key)
   3654         except KeyError as err:
-> 3655             raise KeyError(key) from err
   3656         except TypeError:
   3657             # If we have a listlike key, _check_indexing_error will raise

KeyError: None

The error occurs when I set top_n=10. If I remove this parameter, the plot is generated without errors.

Screenshots

image

image

session_info

-----
anndata             0.10.6
decoupler           1.6.0
liana               1.1.0
loguru              0.7.2
matplotlib          3.8.1
numpy               1.23.5
pandas              2.1.3
plotnine            0.12.4
scanpy              1.9.8
seaborn             0.13.2
session_info        1.0.0
-----
WeipengMO commented 1 month ago

I've found the solution to this issue. The top_n parameter requires additional arguments orderby, orderby_ascending, and orderby_absolute to function correctly.

https://github.com/saezlab/liana-py/blob/eeb9cdbf29e93e0bc4a95e6d9e4d645988d09841/liana/plotting/_common.py#L78

dbdimitrov commented 1 month ago

Hi @WeipengMO I'll add a custom exception to make this more informative