theislab / scCODA

A Bayesian model for compositional single-cell data analysis
BSD 3-Clause "New" or "Revised" License
141 stars 23 forks source link

Bug in viz.boxplots function (and possibly others) #95

Open reemagit opened 4 months ago

reemagit commented 4 months ago

Hello,

I think I found a bug in the viz.boxplots function, and also applied a solution that worked. Explaining it here in case other people ever ran into this problem.

I have a cell counts AnnData object where each row is a subject and each column are the cell counts for that subject. For each subjects I have several categorical features such as Smoke that can assume three values and Disease that assumes two values.

I first plot

viz.boxplots(data_coda, feature_name = "Disease")

This works. Then, I plot

viz.boxplots(data_coda, feature_name = "Smoke")

which also works. However, if I plot the feature "Disease" again, it crashes with the following error:

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
Cell In[73], line 1
----> 1 viz.boxplots(data_coda,'GOLD')

File ~/miniforge3/envs/polvcopd4/lib/python3.9/site-packages/sccoda/util/data_visualization.py:319, in boxplots(data, feature_name, y_scale, plot_facets, add_dots, cell_types, args_boxplot, args_swarmplot, figsize, dpi, cmap, plot_legend, level_order)
    315     args_swarmplot["hue_order"] = level_order
    317 fig, ax = plt.subplots(figsize=figsize, dpi=dpi)
--> 319 sns.boxplot(x="Cell type", y=value_name, hue=feature_name, data=plot_df, fliersize=1,
    320             palette=cmap, ax=ax, **args_boxplot)
    322 if add_dots:
    323     sns.swarmplot(
    324         x="Cell type",
    325         y=value_name,
   (...)
    331         **args_swarmplot
    332     )

File ~/miniforge3/envs/polvcopd4/lib/python3.9/site-packages/seaborn/categorical.py:1634, in boxplot(data, x, y, hue, order, hue_order, orient, color, palette, saturation, fill, dodge, width, gap, whis, linecolor, linewidth, fliersize, hue_norm, native_scale, log_scale, formatter, legend, ax, **kwargs)
   1627 color = _default_color(
   1628     ax.fill_between, hue, color,
   1629     {k: v for k, v in kwargs.items() if k in ["c", "color", "fc", "facecolor"]},
   1630     saturation=saturation,
   1631 )
   1632 linecolor = p._complement_color(linecolor, color, p._hue_map)
-> 1634 p.plot_boxes(
   1635     width=width,
   1636     dodge=dodge,
   1637     gap=gap,
   1638     fill=fill,
   1639     whis=whis,
   1640     color=color,
   1641     linecolor=linecolor,
   1642     linewidth=linewidth,
   1643     fliersize=fliersize,
   1644     plot_kws=kwargs,
   1645 )
   1647 p._add_axis_labels(ax)
   1648 p._adjust_cat_axis(ax, axis=p.orient)

File ~/miniforge3/envs/polvcopd4/lib/python3.9/site-packages/seaborn/categorical.py:745, in _CategoricalPlotter.plot_boxes(self, width, dodge, gap, fill, whis, color, linecolor, linewidth, fliersize, plot_kws)
    742     ax.add_container(BoxPlotContainer(artists))
    744 legend_artist = _get_patch_legend_artist(fill)
--> 745 self._configure_legend(ax, legend_artist, boxprops)

UnboundLocalError: local variable 'boxprops' referenced before assignment

Note that this behavior is not easy to reproduce since I think it depends on how many categories each feature has. Moreover, when plotting other features, I found that the legend contained the levels of the previous feature "Smoke". This means that some global variables are persisting between function calls and are leaking in subsequent calls. I think I reconstructed why this happens. In the data_visualization.py script, the function signature of the boxplots function is defined as

def boxplots(
        data: AnnData,
        feature_name: str,
        y_scale: str = "relative",
        plot_facets: bool = False,
        add_dots: bool = False,
        cell_types: Optional[list] = None,
        args_boxplot: Optional[dict] = {},
        args_swarmplot: Optional[dict] = {},
        figsize: Optional[Tuple[int, int]] = None,
        dpi: Optional[int] = 100,
        cmap: Optional[str] = "Blues",
        plot_legend: Optional[bool] = True,
        level_order: List[str] = None
) -> Optional[Tuple[plt.Subplot, sns.axisgrid.FacetGrid]]:

This function signature defines the default arguments of args_boxplot and args_swamplot as empty dicts ({}). Using mutable default values is generally not recommended in python because it causes unexpected behaviors between function calls (see here). To avoid this behavior, I just replaced the default values with None and filled them within the function:

def boxplots(
        data: AnnData,
        feature_name: str,
        y_scale: str = "relative",
        plot_facets: bool = False,
        add_dots: bool = False,
        cell_types: Optional[list] = None,
        args_boxplot: Optional[dict] = None,
        args_swarmplot: Optional[dict] = None,
        figsize: Optional[Tuple[int, int]] = None,
        dpi: Optional[int] = 100,
        cmap: Optional[str] = "Blues",
        plot_legend: Optional[bool] = True,
        level_order: List[str] = None
) -> Optional[Tuple[plt.Subplot, sns.axisgrid.FacetGrid]]:

and inside the function we add

if args_swarmplot is None:
    args_swarmplot = {}
if args_boxplot is None:
    args_boxplot = {}

Not sure if other functions use mutable default values. Thought it could be useful if other people ever ran into this problem.

johannesostner commented 4 months ago

Thanks for spotting this! I'll keep this fix in mind if we ever release a new version of this package. The actively maintained implementation of scCODA in the pertpy package already contains a very similar fix.

reemagit commented 4 months ago

I didn't know about the pertpy package, I'll check it out, thanks for letting me know. Not sure if I should close the issue or leave it open?

johannesostner commented 4 months ago

I'll leave it open for now, so that I won't forget about this fix. Thanks again!