Subsetted adata object not behaving as expected

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of scanpy.
[ ] (optional) I have confirmed this bug exists on the master branch of scanpy.

Hello, I am working with an adata object (adata.shape produces (8648, 18074)) that I have subset to only include 990 genes of interest (and only include cells that express my genes of interest), with the hopes of clustering cells based on expression of my genes of interest (I got this idea from issue #510). After I subset my adata object, I confirmed that the shape of adata_sub is as expected (adata_sub.shape produces (6603, 990)). However, after running a new embedding and clustering on adata_sub, I have noticed that I can plot genes that shouldn't be in adata_sub (but were in adata), and that when I run sc.tl.rank_genes_groups my results aren't restricted to my 990 genes of interest. I am guessing that I subsetted my data incorrectly (though, why would I have the correct shape?).

Minimal code sample (that we can copy&paste without having any data)

# subset adata to genes of interest
adata_sub = adata[:, [g in genes_list for g in adata.var_names]].copy()

# filter out cells that don't express any genes of interest
sc.pp.filter_cells(adata_sub, min_genes=1)

# run new embedding and clustering
sc.pp.pca(adata_sub, n_comps=50, use_highly_variable=False, svd_solver='arpack')
sc.pp.neighbors(adata_sub)
sc.tl.umap(adata_sub)
sc.tl.leiden(adata_sub, key_added='leiden_sub')

When I use sc.pl.umap(adata_sub) to plot expression of a gene that is not one of my genes of interest, it is still plotted (I would expect an error telling me that the gene is not found in my adata_sub object). Similarly, the results of sc.tl.rank_genes_groups(adata_sub, groupby='leiden_sub', key_added='rank_genes_sub', method='wilcoxon') returns top ranked genes that are not (or should not be) in my adata_sub object.

Thank you for any help/clarification as to what's going on!

Versions

----- anndata 0.7.6 scanpy 1.8.1 sinfo 0.3.4 ----- PIL 8.3.2 anyio NA attr 21.2.0 babel 2.9.1 backcall 0.2.0 beta_ufunc NA binom_ufunc NA certifi 2019.11.28 cffi 1.14.6 chardet 3.0.4 cycler 0.10.0 cython_runtime NA dateutil 2.8.2 debugpy 1.4.3 decorator 5.1.0 defusedxml 0.7.1 entrypoints 0.3 gi 3.36.0 gio NA glib NA gobject NA gtk NA h5py 3.4.0 idna 2.8 igraph 0.9.6 ipykernel 6.4.1 ipython_genutils 0.2.0 ipywidgets 7.6.4 jedi 0.18.0 jinja2 3.0.1 joblib 1.0.1 json5 NA jsonschema 3.2.0 jupyter_server 1.11.0 jupyterlab_server 2.8.1 kiwisolver 1.3.2 leidenalg 0.8.7 llvmlite 0.37.0 markupsafe 2.0.1 matplotlib 3.4.3 matplotlib_inline NA mpl_toolkits NA natsort 7.1.1 nbclassic NA nbformat 5.1.3 nbinom_ufunc NA networkx 2.6.3 numba 0.54.0 numexpr 2.7.3 numpy 1.20.0 packaging 21.0 pandas 1.3.2 parso 0.8.2 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA prometheus_client NA prompt_toolkit 3.0.20 ptyprocess 0.7.0 pvectorc NA pycparser 2.20 pydev_ipython NA pydevconsole NA pydevd 2.4.1 pydevd_concurrency_analyser NA pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.10.0 pynndescent 0.5.4 pyparsing 2.4.7 pyrsistent NA pytz 2021.1 requests 2.22.0 scipy 1.7.1 seaborn 0.11.2 send2trash NA sitecustomize NA six 1.14.0 sklearn 0.24.2 sniffio 1.2.0 statsmodels 0.13.0rc0 storemagic NA tables 3.6.1 terminado 0.12.1 texttable 1.6.4 tornado 6.1 traitlets 5.1.0 typing_extensions NA umap 0.5.1 urllib3 1.25.8 wcwidth 0.2.5 websocket 1.2.1 zmq 22.3.0 ----- IPython 7.27.0 jupyter_client 7.0.2 jupyter_core 4.7.1 jupyterlab 3.1.11 notebook 6.4.3 ----- Python 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0] Linux-5.10.25-linuxkit-x86_64-with-glibc2.29 6 logical CPU cores, x86_64 ----- Session information updated at 2021-09-30 18:02

scverse / scanpy

Subsetted adata object not behaving as expected #2007

Minimal code sample (that we can copy&paste without having any data)

Versions