scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.93k stars 604 forks source link

"Interactive" Dotplot version: check for available var_names #3387

Open FrancescaDr opened 6 days ago

FrancescaDr commented 6 days ago

What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

Please describe your wishes

Feature

Change dotplot to a more interactive version such that var_names that are not in the AnnData object will be ignored. The returned dotplot should only include the var_names (e.i. genes) that are present.

This could be useful for a more interactive way of plotting in the Jupyter notebook because often canonical marker genes lists are run on different Anndata objects but not all have the same gene panels (especially also for spatial transcriptomics data).

Plan

Check for available vars in the AnnData before plotting:

available_vars = adata.var_names
        missing_vars = [name for name in var_names if name not in available_vars]
        if missing_vars:
            logg.warning(
                f"The following variables were not found in the dataset and will be ignored: {', '.join(missing_vars)}"
            )
            var_names = [name for name in var_names if name in available_vars]
            if len(var_names) == 0:
                raise ValueError("No valid variable names found in the dataset")

I am unsure whether this should be called specifically related to the DotPlot class before calling the BasePlot function or whether this is transferable to other plots and can be added to the BasePlot class before preparing the dataframe. @flying-sheep what is you take on this?