plot_cpdb KeyError - Githubissues

hmassalha commented 8 months ago

Hello, thanks for this package to explore cpdb output.

I am facing the following issue with plot_cpdb function:

import ktplotspy as kpy

pp = kpy.plot_cpdb(
        adata = adata,
        cell_type1 = '.',
        cell_type2 = '.', 
        means = means,
        pvals = relevant_interactions,
        celltype_key = "celltype",
        figsize = (7,10),
        max_size = 6,
        highlight_size = 0.75,
        degs_analysis = True,
        standard_scale = True,
    )
pp

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [42], in <module>
      1 import ktplotspy as kpy
----> 3 pp = kpy.plot_cpdb(
      4         adata = adata,
      5         cell_type1 = '.',
      6         cell_type2 = '.', 
      7         means = means,
      8         pvals = relevant_interactions,
      9         celltype_key = "celltype",
     10         figsize = (7,10),
     11         max_size = 6,
     12         highlight_size = 0.75,
     13         degs_analysis = True,
     14         standard_scale = True,
     15     )
     16 pp

File ~/my-conda-envs/sc_Harm/lib/python3.10/site-packages/ktplotspy/plot/plot_cpdb.py:218, in plot_cpdb(adata, cell_type1, cell_type2, means, pvals, celltype_key, degs_analysis, splitby_key, alpha, keep_significant_only, genes, gene_family, custom_gene_family, standard_scale, cluster_rows, cmap_name, max_size, max_highlight_size, default_style, highlight_col, highlight_size, special_character_regex_pattern, exclude_interactions, title, return_table, figsize)
    216 # filter
    217 means_matx = filter_interaction_and_celltype(data=means_mat, genes=query, celltype_pairs=ct_columns)
--> 218 pvals_matx = filter_interaction_and_celltype(data=pvals_mat, genes=query, celltype_pairs=ct_columns)
    219 # reorder the columns
    220 col_order = []

File ~/my-conda-envs/sc_Harm/lib/python3.10/site-packages/ktplotspy/utils/support.py:92, in filter_interaction_and_celltype(data, genes, celltype_pairs)
     75 def filter_interaction_and_celltype(data: pd.DataFrame, genes: List, celltype_pairs: List) -> pd.DataFrame:
     76     """Filter data to interactions and celltypes.
     77 
     78     Parameters
   (...)
     90         Filtered dataframe.
     91     """
---> 92     filtered_data = data[data.interacting_pair.isin(genes)][celltype_pairs]
     93     return filtered_data

File ~/my-conda-envs/sc_Harm/lib/python3.10/site-packages/pandas/core/frame.py:3811, in DataFrame.__getitem__(self, key)
   3809     if is_iterator(key):
   3810         key = list(key)
-> 3811     indexer = self.columns._get_indexer_strict(key, "columns")[1]
   3813 # take() does not accept boolean indexers
   3814 if getattr(indexer, "dtype", None) == bool:

File ~/my-conda-envs/sc_Harm/lib/python3.10/site-packages/pandas/core/indexes/base.py:6113, in Index._get_indexer_strict(self, key, axis_name)
   6110 else:
   6111     keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr)
-> 6113 self._raise_if_missing(keyarr, indexer, axis_name)
   6115 keyarr = self.take(indexer)
   6116 if isinstance(key, Index):
   6117     # GH 42790 - Preserve name from an Index

File ~/my-conda-envs/sc_Harm/lib/python3.10/site-packages/pandas/core/indexes/base.py:6176, in Index._raise_if_missing(self, key, indexer, axis_name)
   6173     raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   6175 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
-> 6176 raise KeyError(f"{not_found} not in index")

KeyError: "['thy_TH_processing>@<thy_TH_processing', 'thy_TH_processing>@<mes_CYGB', 'thy_TH_processing>@<thy_Lumen-forming', 'thy_TH_processing>@<end_Arterial', 'thy_TH_processing>@<mes_IGF1R', 'thy_TH_processing>@<mes_SCN7A', 'thy_TH_processing>@<mes_CCL19', 'end_Capillary>@<thy_TH_processing', 'end_Capillary>@<thy_Lumen-forming', 'end_Capillary>@<mes_IGF1R', 'end_Capillary>@<mes_SCN7A', 'thy_Lumen-forming>@<thy_TH_processing', 'thy_Lumen-forming>@<mes_CYGB', 'thy_Lumen-forming>@<thy_Lumen-forming', 'thy_Lumen-forming>@<end_Arterial', 'thy_Lumen-forming>@<mes_IGF1R', 'thy_Lumen-forming>@<mes_SCN7A', 'thy_Lumen-forming>@<mes_CCL19', 'end_Arterial>@<thy_TH_processing', 'end_Arterial>@<thy_Lumen-forming', 'end_Arterial>@<end_Arterial', 'end_Arterial>@<mes_IGF1R', 'end_Arterial>@<mes_CCL19', 'mes_IGF1R>@<thy_TH_processing', 'mes_IGF1R>@<thy_Lumen-forming', 'mes_IGF1R>@<end_Arterial', 'mes_IGF1R>@<mes_IGF1R', 'mes_IGF1R>@<mes_SCN7A', 'mes_IGF1R>@<mes_CCL19', 'mes_SCN7A>@<mes_SCN7A', 'mes_CCL19>@<thy_TH_processing', 'mes_CCL19>@<thy_Lumen-forming', 'mes_CCL19>@<end_Arterial', 'mes_CCL19>@<mes_IGF1R', 'mes_CCL19>@<mes_SCN7A', 'mes_CCL19>@<mes_CCL19', 'end_Cycling>@<end_Cycling', 'end_Cycling>@<mes_Cycling', 'end_Cycling>@<thy_Cycling', 'mes_Cycling>@<end_Cycling', 'mes_Cycling>@<mes_Cycling', 'mes_Cycling>@<thy_Cycling', 'thy_Cycling>@<end_Cycling', 'thy_Cycling>@<mes_Cycling', 'thy_Cycling>@<thy_Cycling'] not in index"

any hint what I did wrong ? thanks

hmassalha commented 8 months ago

After decreasing the number of cell types, I observed that the function attempts to identify all possible combinations of celltypes between cell_type1 and cell_type2. In exploratory projects, we may not anticipate relevant interactions for every cell type. Therefore, I presume that plot_cpdb will generate output for whichever combinations exist between cell_type1 and cell_type2, without assuming that all combinations will be included in the output. Thank you.

zktuong commented 8 months ago

Hi, i can't replicate this issue with the test set. i'm guessing your single-cell data contains a lot more celltypes than what was provided to run cellphonedb? if so, just slice your single-cell data first.

hmassalha commented 8 months ago

thanks for your reply. Not sure what do you mean by slicing the single-cell data first. Do you mean to remove cell types from the adata? I guess the issue is about combination of cell types. Removing one cell types will dramatically affect the results of other combinations that might have significant interactions. By tracing the functions' code, I see you filter for genes while calling for all cell types combinations to be present in the relevant_interactions file... which is similar to my case...

zktuong commented 8 months ago

correct. remove the celltypes from the adata you are providing to plot_cpdb. this does not impact on your actual cellphonedb run as this is purely for visualisation.

shenwenkang commented 8 months ago

I have the same problem. I found that the problem was with the CellphoneDB results. I am using cpdb_degs_analysis_method for the calculations and then I am getting the following results:

list(cpdb_results.keys()) ['deconvoluted', 'deconvoluted_percents', 'means', 'relevant_interactions', 'significant_means', 'CellSign_active_interactions', 'CellSign_active_interactions_deconvoluted', 'interaction_scores'] The problem is that there are missing columns in the dataframe relevant_interactions (the missing column names are those where no interactions are calculated between the two cell types, i.e. the column A|B of relevant_interactions is all 0, and the column does not appear in relevant_interactions)

col1 = means.columns.to_list() col2 = pvals.columns.to_list() list(set(col1) - set(col2)) ['CD4_Th17|CD8_Tn_IFN_response', 'CD4_Th17|None_T', 'None_T|None_T', 'CD8_Tn_IFN_response|CD8_Tn_IFN_response', 'None_T|CD8_Tn_IFN_response', 'CD8_Tn_IFN_response|None_T'] The solution is just to add the missing columns in relevant_interactions:

for col in cpdb_results['means'].columns: if col not in cpdb_results['relevant_interactions'].columns: cpdb_results['relevant_interactions'][col] = 0

zktuong commented 8 months ago

thanks @shenwenkang for finding the solution! will try and fix this when i have time.

zktuong commented 6 months ago

seems like the fix introduces further bugs (e.g. plotting non-significant interactions). need to think of a different solution

zktuong commented 6 months ago

@hmassalha @shenwenkang, could i ask you to try if this version works?

you can install with pip install git+https://www.github.com/zktuong/ktplots@fix-uneven-columns-and-rows or just the master branch after i've merged this

zktuong / ktplotspy

plot_cpdb KeyError #51