scverse / anndata

Annotated data.
http://anndata.readthedocs.io
BSD 3-Clause "New" or "Revised" License
584 stars 155 forks source link

Debian test failure in test_get_uns_neighbors_deprecated #443

Open detrout opened 4 years ago

detrout commented 4 years ago

Hello,

We were trying packaging anndata 0.7.4 for debian-med and ran into some test failures (There's a minor issue where the Debian test process builds the package in a separate directory and runs tests there, and some of the small test data files aren't copied over so the tests fail. I added some setup(package_data) globs and fixed that, I can file a separate issue if you're interested)

The issue I'm more stumped on is this one: We're testing with python3.8 & 3.9 and get this exception with both versions of python. By any chance do you have any suggestions what might be wrong?

anndata/tests/test_deprecations.py::test_get_uns_neighbors_deprecated FAILED                                                                                   [ 36%]

adata = AnnData object with n_obs × n_vars = 2 × 3
    obs: 'anno1'
    var: 'anno2'
    uns: 'neighbors'
    layers: 'x2'
    obsp: 'connectivities'

    def test_get_uns_neighbors_deprecated(adata):
        n = adata.shape[0]
        mtx = sparse.random(n, n, density=0.3, format="csr")
        adata.obsp["connectivities"] = mtx
        adata.uns["neighbors"] = {}

        with pytest.warns(FutureWarning):
            from_uns = adata.uns["neighbors"]["connectivities"]

        assert_equal(from_uns, mtx)

        with pytest.warns(None) as rec:
            v = adata[: n // 2]
>           assert not rec
E           assert not WarningsChecker(record=True)

anndata/tests/test_deprecations.py:113: AssertionError
detrout commented 4 years ago

Might help to show what the warnings are:

p vars(rec.list[0])
{'message': FutureWarning('is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead'), 
 'category': <class 'FutureWarning'>, 
 'filename': '/build/python-anndata-exfcES/python-anndata-0.7.4+ds/.pybuild/cpython3_3.9_anndata/build/anndata/_core/anndata.py', 
 'lineno': 1094, 
'file': None, 
'line': None, 
'source': None, 
'_category_name': 'FutureWarning'}
(Pdb) p vars(rec.list[1])
{'message': FutureWarning('is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead'), 
'category': <class 'FutureWarning'>, 
'filename': '/build/python-anndata-exfcES/python-anndata-0.7.4+ds/.pybuild/cpython3_3.9_anndata/build/anndata/_core/anndata.py', 
'lineno': 1094, 
'file': None, 
'line': None, 
'source': None, 
'_category_name': 'FutureWarning'}
detrout commented 4 years ago

After reading the warnings, Debian is shipping pandas 1.1.3, and in that is_categorical is deprecated in favor of is_categorical_dtype, using this patch resolves the test failure, though I don't know if you'd rather accept the warnings to work with older versions of pandas.

Also as an aside it looks like there's more from pandas.api.types import is_categorical statements than is needed.

--- anndata/_core/anndata.py    2020-11-05 12:23:54.976471806 -0800
+++ /run/schroot/mount/unstable-amd64-sbuild-6f63c09f-36e0-4302-9409-6689c5b05354/build/python-anndata-exfcES/python-anndata-0.7.4+ds/anndata/_core/anndata.py  2020-11-05 15:28:33.517496264 -0800
@@ -19,5 +19,5 @@
 from numpy import ma
 import pandas as pd
-from pandas.api.types import is_string_dtype, is_categorical
+from pandas.api.types import is_string_dtype, is_categorical_dtype
 from scipy import sparse
 from scipy.sparse import issparse
@@ -1089,8 +1089,8 @@

     def _remove_unused_categories(self, df_full, df_sub, uns):
-        from pandas.api.types import is_categorical
+        from pandas.api.types import is_categorical_dtype

         for k in df_full:
-            if not is_categorical(df_full[k]):
+            if not is_categorical_dtype(df_full[k]):
                 continue
             all_categories = df_full[k].cat.categories
@@ -1190,5 +1190,5 @@
                 key
                 for key in df.columns
-                if is_string_dtype(df[key]) and not is_categorical(df[key])
+                if is_string_dtype(df[key]) and not is_categorical_dtype(df[key])
             ]
             for key in string_cols:
ivirshup commented 4 years ago

Thanks for the report and diagnosis!

I believe we've already done this on master, but haven't made a release yet. I think we'll be making a bugfix release soon which will include this.

ivirshup commented 4 years ago

Just made that release, are your builds working now?

mr-c commented 3 years ago

@ivirshup Hello! The build works on 64bit systems, but the tests fail on 32-bit systems.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=982098

=================================== FAILURES ===================================
___________________ test_set_dataframe[<lambda>-False-int32] ___________________

homogenous = False, df = <function <lambda> at 0xf1d74148>
dtype = <class 'numpy.int32'>

    @pytest.mark.parametrize(
        "df,homogenous,dtype",
        [
            (lambda: gen_typed_df_t2_size(*X.shape), True, np.object_),
            (lambda: pd.DataFrame(X ** 2), False, np.int_),
        ],
    )
    def test_set_dataframe(homogenous, df, dtype):
        adata = AnnData(X)
        if homogenous:
            with pytest.warns(UserWarning, match=r"Layer 'df'.*dtype object"):
                adata.layers["df"] = df()
        else:
            with pytest.warns(None) as warnings:
                adata.layers["df"] = df()
                assert not len(warnings)
        assert isinstance(adata.layers["df"], np.ndarray)
>       assert np.issubdtype(adata.layers["df"].dtype, dtype)
E       AssertionError: assert False
E        +  where False = <function issubdtype at 0xf60991d8>(dtype('int32'), <class 'numpy.int32'>)
E        +    where <function issubdtype at 0xf60991d8> = np.issubdtype
E        +    and   dtype('int32') = array([[ 1,  4,  9],\n       [16, 25, 36],\n       [49, 64, 81]], dtype=int32).dtype

/usr/lib/python3/dist-packages/anndata/tests/test_layers.py:62: AssertionError

Perhaps this has been fixed and you can suggest a commit or PR that we can apply as a patch? Or a new release?

Thanks,

ivirshup commented 3 years ago

Not sure that's a bug on our end. That assertion looks like it should be fine, regardless of platform.

To me, it looks like there is a bug in whatever version of numpy that system is using?

detrout commented 3 years ago

This is just weird and supports the idea that it's something in numpy.

I ran this python3 -m pytest --pdb --pyargs anndata

and in the generated pdb session in a i386 chroot found this sequence in anndata/tests/test_layers.py:62 quite weird.

Module variables from test_layers.py:10

X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
L = np.array([[10, 11, 12], [13, 14, 15], [16, 17, 18]])
(Pdb) p X.dtype, (X ** 2).dtype
(dtype('int32'), dtype('int32'))
(Pdb) p np.issubdtype(X.dtype, np.int_)
True
(Pdb) p np.issubdtype((X ** 2).dtype, np.int_)
False
(Pdb) p X.dtype is L.dtype
True
(Pdb) p X.dtype is (X ** 2).dtype
False
detrout commented 3 years ago

Yep this reproduces the issue and only depends on numpy.

It runs correctly on 64-bit intel and fails on 32-bit intel.

import numpy as np

X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

assert np.issubdtype(X.dtype, np.integer)
assert np.issubdtype(X.dtype, np.int_)
assert np.issubdtype((X**2).dtype, np.integer)
assert np.issubdtype((X**2).dtype, np.int_)

on a i386 system:

Traceback (most recent call last):
  File "/tmp/repro.py", line 8, in <module>
    assert np.issubdtype((X**2).dtype, np.int_)
AssertionError