scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.92k stars 602 forks source link

The read_10x_mtx() function does not work due to the update of the anndata package to 0.10.4 (January 14, 2024; Scanpy=1.9.6) #2806

Closed NikolaiL-dev closed 10 months ago

NikolaiL-dev commented 10 months ago

Please make sure these conditions are met

What happened?

The read_10x_mtx() function does not work due to the update of the anndata package to 0.10.4 (January 14, 2024) (?Error reading the file features.tsv.gz)

The launch was carried out on the following data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM5733023

You can also download files using my google drive: https://drive.google.com/drive/folders/1p6ilbsJX_cYZb4HG0OSbLHAwQObqmncW?usp=sharing

My actions:

1) I have installed the latest version of scanpy=1.9.6 using conda:

$ conda --version
conda 23.10.0
$ conda install scanpy
# Channels:
#  - conda-forge
#  - bioconda
#  - defaults
# Platform: linux-64

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
anndata                   0.10.4             pyhd8ed1ab_0    conda-forge
array-api-compat          1.4                pyhd8ed1ab_0    conda-forge
brotli                    1.1.0                hd590300_1    conda-forge
brotli-bin                1.1.0                hd590300_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.25.0               hd590300_0    conda-forge
ca-certificates           2023.11.17           hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
certifi                   2023.11.17         pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
contourpy                 1.2.0           py311h9547e67_0    conda-forge
cycler                    0.12.1             pyhd8ed1ab_0    conda-forge
exceptiongroup            1.2.0              pyhd8ed1ab_2    conda-forge
fonttools                 4.47.2          py311h459d7ec_0    conda-forge
freetype                  2.12.1               h267a509_2    conda-forge
get-annotations           0.1.2              pyhd8ed1ab_0    conda-forge
h5py                      3.10.0          nompi_py311hebc2b07_101    conda-forge
hdf5                      1.14.3          nompi_h4f84152_100    conda-forge
icu                       73.2                 h59595ed_0    conda-forge
joblib                    1.3.2              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.5           py311h9547e67_1    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
lcms2                     2.16                 hb7c19ff_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libaec                    1.1.2                h59595ed_1    conda-forge
libblas                   3.9.0           20_linux64_openblas    conda-forge
libbrotlicommon           1.1.0                hd590300_1    conda-forge
libbrotlidec              1.1.0                hd590300_1    conda-forge
libbrotlienc              1.1.0                hd590300_1    conda-forge
libcblas                  3.9.0           20_linux64_openblas    conda-forge
libcurl                   8.5.0                hca28451_0    conda-forge
libdeflate                1.19                 hd590300_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h807b86a_3    conda-forge
libgfortran-ng            13.2.0               h69a702a_3    conda-forge
libgfortran5              13.2.0               ha4646dd_3    conda-forge
libgomp                   13.2.0               h807b86a_3    conda-forge
libhwloc                  2.9.3           default_h554bfaf_1009    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblapack                 3.9.0           20_linux64_openblas    conda-forge
libllvm14                 14.0.6               hcd5def8_4    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.25          pthreads_h413a1c8_0    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libsqlite                 3.44.2               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_3    conda-forge
libtiff                   4.6.0                ha9c0a0a_2    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp-base              1.3.2                hd590300_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.11.6               h232c23b_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
llvmlite                  0.41.1          py311ha6695c7_0    conda-forge
matplotlib-base           3.8.2           py311h54ef318_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
natsort                   8.4.0              pyhd8ed1ab_0    conda-forge
ncurses                   6.4                  h59595ed_2    conda-forge
networkx                  3.2.1              pyhd8ed1ab_0    conda-forge
numba                     0.58.1          py311h96b013e_0    conda-forge
numpy                     1.26.3          py311h64a7726_0    conda-forge
openjpeg                  2.5.0                h488ebb8_3    conda-forge
openssl                   3.2.0                hd590300_1    conda-forge
packaging                 23.2               pyhd8ed1ab_0    conda-forge
pandas                    2.1.4           py311h320fe9a_0    conda-forge
patsy                     0.5.6              pyhd8ed1ab_0    conda-forge
pillow                    10.2.0          py311ha6c5da5_0    conda-forge
pip                       23.3.2             pyhd8ed1ab_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pynndescent               0.5.11             pyhca7485f_0    conda-forge
pyparsing                 3.1.1              pyhd8ed1ab_0    conda-forge
python                    3.11.7          hab00c5b_1_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-tzdata             2023.4             pyhd8ed1ab_0    conda-forge
python_abi                3.11                    4_cp311    conda-forge
pytz                      2023.3.post1       pyhd8ed1ab_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
scanpy                    1.9.6              pyhd8ed1ab_1    conda-forge
scikit-learn              1.3.2           py311hc009520_2    conda-forge
scipy                     1.11.4          py311h64a7726_0    conda-forge
seaborn                   0.13.1               hd8ed1ab_0    conda-forge
seaborn-base              0.13.1             pyhd8ed1ab_0    conda-forge
session-info              1.0.0              pyhd8ed1ab_0    conda-forge
setuptools                69.0.3             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
statsmodels               0.14.1          py311h1f0f07a_0    conda-forge
stdlib-list               0.8.0              pyhd8ed1ab_0    conda-forge
tbb                       2021.11.0            h00ab1b0_0    conda-forge
threadpoolctl             3.2.0              pyha21a80b_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
tqdm                      4.66.1             pyhd8ed1ab_0    conda-forge
tzdata                    2023d                h0c530f3_0    conda-forge
umap-learn                0.5.5           py311h38be061_0    conda-forge
wheel                     0.42.0             pyhd8ed1ab_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge

2) I imported the scanpy, seaborn, pandas, numpy and matplotlib libraries. Then I called the read_10x_mtx() function. The code is given below.

>>> import scanpy as sc
>>> import pandas as pd
>>> import numpy as np
>>> import matplotlib
>>> import seaborn as sns
>>> !ls -lh ./H004/
-rwxrwxrwx. 1 nikolay nikolay  49K Mar 25  2021 barcodes.tsv.gz
-rwxrwxrwx. 1 nikolay nikolay 424K Mar 25  2021 features.tsv.gz
-rwxrwxrwx. 1 nikolay nikolay 101M Mar 25  2021 matrix.mtx.gz
>>> adata = sc.read_10x_mtx(
...     './H004/',  
...     var_names='gene_symbols',      
...     cache=True)
>>> adata.var_names_make_unique()
>>> print(adata.var)
                         gene_ids    feature_types
Gm26206        ENSMUSG00000064842  Gene Expression
Gm26206-1      ENSMUSG00000064842  Gene Expression
Gm26206-2      ENSMUSG00000064842  Gene Expression
Gm26206-3      ENSMUSG00000064842  Gene Expression
Gm26206-4      ENSMUSG00000064842  Gene Expression
...                           ...              ...
Gm26206-55445  ENSMUSG00000064842  Gene Expression
Gm26206-55446  ENSMUSG00000064842  Gene Expression
Gm26206-55447  ENSMUSG00000064842  Gene Expression
Gm26206-55448  ENSMUSG00000064842  Gene Expression
Gm26206-55449  ENSMUSG00000064842  Gene Expression

[55450 rows x 2 columns]

The problem is the error in importing gene names both when using id and when using symbolic labeling. All genes have the same name. if you use anndata=0.10.3 instead of anndata=0.10.4, then everything works correctly.

Minimal code sample

import scanpy as sc
import pandas as pd
import numpy as np
import matplotlib
import seaborn as sns

path='<path_to_files>'

adata = sc.read_10x_mtx(
    path,  
    var_names='gene_symbols',      
    cache=True)

adata.var_names_make_unique()

adata.var

Error output

>>> # then anndata=0.10.4
>>> print(adata.var)
                         gene_ids    feature_types
Gm26206        ENSMUSG00000064842  Gene Expression
Gm26206-1      ENSMUSG00000064842  Gene Expression
Gm26206-2      ENSMUSG00000064842  Gene Expression
Gm26206-3      ENSMUSG00000064842  Gene Expression
Gm26206-4      ENSMUSG00000064842  Gene Expression
...                           ...              ...
Gm26206-55445  ENSMUSG00000064842  Gene Expression
Gm26206-55446  ENSMUSG00000064842  Gene Expression
Gm26206-55447  ENSMUSG00000064842  Gene Expression
Gm26206-55448  ENSMUSG00000064842  Gene Expression
Gm26206-55449  ENSMUSG00000064842  Gene Expression

[55450 rows x 2 columns]

Expected

>>> # then anndata=0.10.3
>>> print(adata.var)
                         gene_ids    feature_types
4933401J01Rik  ENSMUSG00000102693  Gene Expression
Gm26206        ENSMUSG00000064842  Gene Expression
Xkr4           ENSMUSG00000051951  Gene Expression
Gm18956        ENSMUSG00000102851  Gene Expression
Gm37180        ENSMUSG00000103377  Gene Expression
...                           ...              ...
mt-Nd6         ENSMUSG00000064368  Gene Expression
mt-Te          ENSMUSG00000064369  Gene Expression
mt-Cytb        ENSMUSG00000064370  Gene Expression
mt-Tt          ENSMUSG00000064371  Gene Expression
mt-Tp          ENSMUSG00000064372  Gene Expression

[55450 rows x 2 columns]

Versions

import scanpy; scanpy.logging.print_versions()

session with an error

``` ----- anndata 0.10.4 scanpy 1.9.6 ----- PIL 10.2.0 anyio NA arrow 1.3.0 asttokens NA attr 23.2.0 attrs 23.2.0 babel 2.14.0 brotli 1.1.0 certifi 2023.11.17 cffi 1.16.0 charset_normalizer 3.3.2 colorama 0.4.6 comm 0.2.1 cycler 0.12.1 cython_runtime NA dateutil 2.8.2 debugpy 1.8.0 decorator 5.1.1 defusedxml 0.7.1 executing 2.0.1 fastjsonschema NA fqdn NA h5py 3.10.0 idna 3.6 ipykernel 6.28.0 isoduration NA jedi 0.19.1 jinja2 3.1.3 joblib 1.3.2 json5 NA jsonpointer 2.4 jsonschema 4.20.0 jsonschema_specifications NA jupyter_events 0.9.0 jupyter_server 2.12.4 jupyterlab_server 2.25.2 kiwisolver 1.4.5 llvmlite 0.41.1 markupsafe 2.1.3 matplotlib 3.8.2 mpl_toolkits NA natsort 8.4.0 nbformat 5.9.2 numba 0.58.1 numpy 1.26.3 overrides NA packaging 23.2 pandas 2.1.4 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 platformdirs 4.1.0 prometheus_client NA prompt_toolkit 3.0.42 psutil 5.9.7 ptyprocess 0.7.0 pure_eval 0.2.2 pydev_ipython NA pydevconsole NA pydevd 2.9.5 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.17.2 pyparsing 3.1.1 pythonjsonlogger NA pytz 2023.3.post1 referencing NA requests 2.31.0 rfc3339_validator 0.1.4 rfc3986_validator 0.1.1 rpds NA scipy 1.11.4 send2trash NA session_info 1.0.0 six 1.16.0 sklearn 1.3.2 sniffio 1.3.0 socks 1.7.1 stack_data 0.6.2 threadpoolctl 3.2.0 tornado 6.3.3 traitlets 5.14.1 typing_extensions NA uri_template NA urllib3 2.1.0 wcwidth 0.2.13 webcolors 1.13 websocket 1.7.0 yaml 6.0.1 zmq 25.1.2 zoneinfo NA ----- IPython 8.20.0 jupyter_client 8.6.0 jupyter_core 5.7.1 jupyterlab 4.0.10 ----- Python 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:43:09) [GCC 12.3.0] Linux-6.6.9-200.fc39.x86_64-x86_64-with-glibc2.38 ----- Session information updated at 2024-01-14 04:38 ```

working version

``` ----- anndata 0.10.3 scanpy 1.9.6 ----- PIL 10.2.0 anyio NA arrow 1.3.0 asttokens NA attr 23.2.0 attrs 23.2.0 babel 2.14.0 brotli 1.1.0 certifi 2023.11.17 cffi 1.16.0 charset_normalizer 3.3.2 colorama 0.4.6 comm 0.2.1 cycler 0.12.1 cython_runtime NA dateutil 2.8.2 debugpy 1.8.0 decorator 5.1.1 defusedxml 0.7.1 executing 2.0.1 fastjsonschema NA fqdn NA h5py 3.10.0 idna 3.6 ipykernel 6.28.0 isoduration NA jedi 0.19.1 jinja2 3.1.3 joblib 1.3.2 json5 NA jsonpointer 2.4 jsonschema 4.20.0 jsonschema_specifications NA jupyter_events 0.9.0 jupyter_server 2.12.4 jupyterlab_server 2.25.2 kiwisolver 1.4.5 llvmlite 0.41.1 markupsafe 2.1.3 matplotlib 3.8.2 mpl_toolkits NA natsort 8.4.0 nbformat 5.9.2 numba 0.58.1 numpy 1.26.3 overrides NA packaging 23.2 pandas 2.1.4 parso 0.8.3 patsy 0.5.6 pexpect 4.8.0 pickleshare 0.7.5 platformdirs 4.1.0 prometheus_client NA prompt_toolkit 3.0.42 psutil 5.9.7 ptyprocess 0.7.0 pure_eval 0.2.2 pydev_ipython NA pydevconsole NA pydevd 2.9.5 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.17.2 pyparsing 3.1.1 pythonjsonlogger NA pytz 2023.3.post1 referencing NA requests 2.31.0 rfc3339_validator 0.1.4 rfc3986_validator 0.1.1 rpds NA scipy 1.11.4 seaborn 0.13.1 send2trash NA session_info 1.0.0 six 1.16.0 sklearn 1.3.2 sniffio 1.3.0 socks 1.7.1 stack_data 0.6.2 statsmodels 0.14.1 threadpoolctl 3.2.0 tornado 6.3.3 traitlets 5.14.1 typing_extensions NA uri_template NA urllib3 2.1.0 wcwidth 0.2.13 webcolors 1.13 websocket 1.7.0 yaml 6.0.1 zmq 25.1.2 zoneinfo NA ----- IPython 8.20.0 jupyter_client 8.6.0 jupyter_core 5.7.1 jupyterlab 4.0.10 ----- Python 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:43:09) [GCC 12.3.0] Linux-6.6.9-200.fc39.x86_64-x86_64-with-glibc2.38 ----- Session information updated at 2024-01-14 04:32 ```
flying-sheep commented 10 months ago

That’s strange, because I made this change to fix this in some situations.

-gex_rows = [x == "Gene Expression" for x in adata.var["feature_types"]]
+gex_rows = adata.var["feature_types"] == "Gene Expression"
 return adata[:, gex_rows].copy()

Seems like this function needs even more work.

RayRay-23 commented 10 months ago

Hi I encountered the same issue. It cannot read mtx and features appropriately.

JunyanReplicant commented 10 months ago

I have the same problem. Didn't expect it is due to the anndata package. Downgrade anndata to 0.10.2 solve this problem.

flying-sheep commented 10 months ago

OK, this is indeed fixed in scanpy master and the bugfix branch (1.9.x). Please try installing the current bugfix branch:

pip install 'scanpy @ git+https://github.com/scverse/scanpy.git@1.9.x'

We will release a new feature version in 3 weeks, so there will probably be no bugfix release before that.