scverse / scirpy

A scanpy extension to analyse single-cell TCR and BCR data.
https://scirpy.scverse.org/en/latest/
BSD 3-Clause "New" or "Revised" License
220 stars 34 forks source link

Errors when running ir.pl.clonotype_imbalance and sc.pl.umap of specific clonotypes #244

Open Movahedilab opened 3 years ago

Movahedilab commented 3 years ago

Thank you for the really great tool! I was trying to rerun the tutorial, but I got several errors. I would appreciate any help on what could have gone wrong. Two of them were while running ir.pl.clonotype_imbalance:

ir.pl.clonotype_imbalance(
    adata,
    replicate_col="sample",
    groupby="source",
    case_label="Tumor",
    plot_type="strip",
)
WARNING: Clonotype imbalance not found. Running `ir.tl.clonotype_imbalance` and storing under {key_added}
WARNING: Clonotype imbalance calculation depends on repertoire overlap. We could not detect any previous runs of repertoire_overlap, so the tool is running now...
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-74-9b5b8bddb872> in <module>
      4     groupby="source",
      5     case_label="Tumor",
----> 6     plot_type="strip",
      7 )

~/anaconda3/lib/python3.7/site-packages/scirpy/_plotting/_clonotype_imbalance.py in clonotype_imbalance(adata, replicate_col, groupby, case_label, control_label, target_col, additional_hue, top_n, fraction, inplace, plot_type, key_added, xlab, ylab, title, **kwargs)
     95             additional_hue=additional_hue,
     96             fraction=fraction,
---> 97             key_added=key_added,
     98         )
     99 

~/anaconda3/lib/python3.7/site-packages/scirpy/_tools/_clonotype_imbalance.py in clonotype_imbalance(adata, replicate_col, groupby, case_label, control_label, target_col, additional_hue, fraction, inplace, overlap_key, key_added)
    119         for suspect in suspects:
    120             p, logfoldchange, rel_case_sizes, rel_control_sizes = _calculate_imbalance(
--> 121                 tdf1[suspect], tdf2[suspect], ncase, ncontrol, global_minimum
    122             )
    123             clt_stats.append([suspect, p, -np.log10(p), logfoldchange])

~/anaconda3/lib/python3.7/site-packages/scirpy/_tools/_clonotype_imbalance.py in _calculate_imbalance(case_sizes, control_sizes, ncase, ncontrol, global_minimum)
    271     )
    272     logfoldchange = np.log2(
--> 273         (case_mean_freq + global_minimum) / (control_mean_freq + global_minimum)
    274     )
    275     return p, logfoldchange, rel_case_sizes, rel_control_sizes

ZeroDivisionError: float division by zero

Second error:

ir.pl.clonotype_imbalance(
    adata,
    replicate_col="sample",
    groupby="source",
    case_label="Tumor",
    additional_hue="diagnosis",
    plot_type="volcano",
    fig_kws={"dpi": 120},
)
WARNING: Clonotype imbalance not found. Running `ir.tl.clonotype_imbalance` and storing under {key_added}
WARNING: Clonotype imbalance calculation depends on repertoire overlap. We could not detect any previous runs of repertoire_overlap, so the tool is running now...
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'diagnosis'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-75-f891a51d2311> in <module>
      6     additional_hue="diagnosis",
      7     plot_type="volcano",
----> 8     fig_kws={"dpi": 120},
      9 )

~/anaconda3/lib/python3.7/site-packages/scirpy/_plotting/_clonotype_imbalance.py in clonotype_imbalance(adata, replicate_col, groupby, case_label, control_label, target_col, additional_hue, top_n, fraction, inplace, plot_type, key_added, xlab, ylab, title, **kwargs)
     95             additional_hue=additional_hue,
     96             fraction=fraction,
---> 97             key_added=key_added,
     98         )
     99 

~/anaconda3/lib/python3.7/site-packages/scirpy/_tools/_clonotype_imbalance.py in clonotype_imbalance(adata, replicate_col, groupby, case_label, control_label, target_col, additional_hue, fraction, inplace, overlap_key, key_added)
     97     # Create a series of case-control groups for comparison
     98     case_control_groups = _create_case_control_groups(
---> 99         adata.obs, replicate_col, groupby, additional_hue, case_label, control_label
    100     )
    101 

~/anaconda3/lib/python3.7/site-packages/scirpy/_tools/_clonotype_imbalance.py in _create_case_control_groups(df, replicate_col, groupby, additional_hue, case_label, control_label)
    199     else:
    200         group_cols.append(additional_hue)
--> 201         hues = df[additional_hue].unique()
    202     df = df.groupby(group_cols, observed=True).agg("size").reset_index()
    203 

~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'diagnosis'

And third error occured when trying to plot Top differential clonotypes between CD8_Teff and CD8_Trm clsuters on a UMAP:

freq, stat = ir.tl.clonotype_imbalance(
    adata,
    replicate_col="sample",
    groupby="cluster",
    case_label="CD8_Teff",
    control_label="CD8_Trm",
    inplace=False,
)
top_differential_clonotypes = stat["clonotype"].tolist()[:5]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4), gridspec_kw={"wspace": 0.6})
sc.pl.umap(adata, color="cluster", ax=ax1, show=False)
sc.pl.umap(
    adata,
    color="clonotype",
    groups=top_differential_clonotypes,
    ax=ax2,
    # increase size of highlighted dots
    size=[
        80 if c in top_differential_clonotypes else 30 for c in adata.obs["clonotype"]
    ],
)
TypeError                                 Traceback (most recent call last)
<ipython-input-78-98b9b22b216a> in <module>
      8     # increase size of highlighted dots
      9     size=[
---> 10         80 if c in top_differential_clonotypes else 30 for c in adata.obs["clonotype"]
     11     ],
     12 )

~/anaconda3/lib/python3.7/site-packages/scanpy/plotting/_tools/scatterplots.py in umap(adata, **kwargs)
    603     If `show==False` a :class:`~matplotlib.axes.Axes` or a list of it.
    604     """
--> 605     return embedding(adata, 'umap', **kwargs)
    606 
    607 

~/anaconda3/lib/python3.7/site-packages/scanpy/plotting/_tools/scatterplots.py in embedding(adata, basis, color, gene_symbols, use_raw, sort_order, edges, edges_width, edges_color, neighbors_key, arrows, arrows_kwds, groups, components, layer, projection, scale_factor, color_map, cmap, palette, na_color, na_in_legend, size, frameon, legend_fontsize, legend_fontweight, legend_loc, legend_fontoutline, vmax, vmin, add_outline, outline_width, outline_color, ncols, hspace, wspace, title, show, save, ax, return_fig, **kwargs)
    243             use_raw=use_raw,
    244             gene_symbols=gene_symbols,
--> 245             groups=groups,
    246         )
    247         color_vector, categorical = _color_vector(

~/anaconda3/lib/python3.7/site-packages/scanpy/plotting/_tools/scatterplots.py in _get_color_source_vector(adata, value_to_plot, use_raw, gene_symbols, layer, groups)
   1090         values = adata.obs_vector(value_to_plot, layer=layer)
   1091     if groups and is_categorical_dtype(values):
-> 1092         values = values.replace(values.categories.difference(groups), np.nan)
   1093     return values
   1094 

~/anaconda3/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in replace(self, to_replace, value, inplace)
   2442         inplace = validate_bool_kwarg(inplace, "inplace")
   2443         cat = self if inplace else self.copy()
-> 2444         if to_replace in cat.categories:
   2445             if isna(value):
   2446                 cat.remove_categories(to_replace, inplace=True)

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in __contains__(self, key)
   3898     @Appender(_index_shared_docs["contains"] % _index_doc_kwargs)
   3899     def __contains__(self, key) -> bool:
-> 3900         hash(key)
   3901         try:
   3902             return key in self._engine

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in __hash__(self)
   3905 
   3906     def __hash__(self):
-> 3907         raise TypeError(f"unhashable type: {repr(type(self).__name__)}")
   3908 
   3909     def __setitem__(self, key, value):

TypeError: unhashable type: 'Index'

Here are the package versions I have:

scanpy==1.7.1 anndata==0.7.3 umap==0.4.4 numpy==1.18.1 scipy==1.4.1 pandas==1.0.1 scikit-learn==0.22.1 statsmodels==0.11.0 python-igraph==0.7.1 leidenalg==0.8.0 scirpy==0.6.1

Here is the whole code I used: Scirpy_tutorial_3kT_cancer.zip

Thanks in advance!

grst commented 3 years ago

Hi @Movahedilab,

thanks for your report! I wasn't able to reproduce the error using datasets.wu2020_3k(), the indicated versions and your notebook. Since the input numbers of your notebook are not consistent, I assume that you may have done something with adata that is not part of the notebook and causes the error.

Or have you modified the numpy error handling, e.g. using seterr? Because I do get a warning about zero division, but not an error.

Can you bring down the problem to a minimal reproducible example?


Apart from that, I would be curious on how you are planning to apply the clonotype_imbalance function to your data. We still consider this function experimental, and I am currently working on an improved/modified version and I would be interested to know if it still meets your use case.

Cheers, Gregor

Movahedilab commented 3 years ago

Hi Gregor,

Thanks for the quick response!

I don't have much experience in python, but I have attached what I hope is a reproducible example. I haven't changed anything in the numpy settings and I didn't do any modifications to the adata.

I have also tried to run the tutorial on my dataset, using the same kernel, and surprisingly pl.clonotype_imbalance worked, though sc.pl.umap of specific clonotypes still gave the same error (see the second attached notebook).

Regarding your question on my plans for clonotype_imbalance, I won't be able to use it for the data I am working at the moment, as it doesn't have any expanded clonotypes. But for future datasets, I find this function very interesting for digging more in depth in the differences between clonotypes.

I have another question/comment. In the second attached notebook (DL017-018-example.ipynb), when I plot "clonal_expansion" and "clonotype_size", I get cells from the 1, 2 and >=3 category: clonal_expansions

However ,in this dataset, mostly the cells in the upper cluster are B cells, and all the remaining cells do not have BCRs, so they should be in a category "0". These cells correctly have "has_ir"=False, but "clonal_expansion"=1:

has_ir

Best, Daliya


From: Gregor Sturm notifications@github.com Sent: Monday, March 1, 2021 2:23 PM To: icbi-lab/scirpy scirpy@noreply.github.com Cc: Movahedilab daliya.kancheva@vib.be; Mention mention@noreply.github.com Subject: Re: [icbi-lab/scirpy] Errors when running ir.pl.clonotype_imbalance and sc.pl.umap of specific clonotypes (#244)

Hi @Movahedilabhttps://github.com/Movahedilab,

thanks for your report! I wasn't able to reproduce the error using datasets.wu2020_3k(), the indicated versions and your notebook. Since the input numbers of your notebook are not consistent, I assume that you may have done something with adata that is not part of the notebook and causes the error.

Or have you modified the numpy error handling, e.g. using seterrhttps://numpy.org/doc/stable/reference/generated/numpy.seterr.html#numpy.seterr? Because I do get a warning about zero division, but not an error.

Can you bring down the problem to a minimal reproducible examplehttps://stackoverflow.com/help/minimal-reproducible-example?


Apart from that, I would be curious on how you are planning to apply the clonotype_imbalance function to your data. We still consider this function experimental, and I am currently working on an improved/modified version and I would be interested to know if it still meets your use case.

Cheers, Gregor

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/icbi-lab/scirpy/issues/244#issuecomment-787945396, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASBXVL42X37WJDFXMPDATG3TBOINBANCNFSM4YINGN2A.

grst commented 3 years ago

Hi Daliya,

unfortunately GitHub discards attachments sent by email. Could you please upload them using the web interface so that I can look into this?

Best, Gregor

Movahedilab commented 3 years ago

Sorry, here they are: Examples_scirpy.zip

grst commented 3 years ago

Thanks!

For me it runs through, although still with the warnings. Could you try to run

np.seterr(all="warn")

at the beginning of your notebook? I am starting to suspect you might have different default numpy setting -- for whatever reason.

Movahedilab commented 3 years ago

Thanks for all the effort! I ran the same notebook with np.seterr(all="warn"): Scirpy_3kT_cancer_error_example.zip

grst commented 3 years ago

Maybe different column types are the problem? https://github.com/pandas-dev/pandas/issues/17190

Yet that doesn't explain why it works for me, but not for you with the same versions. I'll keep this open and look into it when I have a bit more time.