Open detrout opened 4 years ago
Might help to show what the warnings are:
p vars(rec.list[0])
{'message': FutureWarning('is_categorical is deprecated and will be removed in a future version. Use is_categorical_dtype instead'),
'category': <class 'FutureWarning'>,
'filename': '/build/python-anndata-exfcES/python-anndata-0.7.4+ds/.pybuild/cpython3_3.9_anndata/build/anndata/_core/anndata.py',
'lineno': 1094,
'file': None,
'line': None,
'source': None,
'_category_name': 'FutureWarning'}
(Pdb) p vars(rec.list[1])
{'message': FutureWarning('is_categorical is deprecated and will be removed in a future version. Use is_categorical_dtype instead'),
'category': <class 'FutureWarning'>,
'filename': '/build/python-anndata-exfcES/python-anndata-0.7.4+ds/.pybuild/cpython3_3.9_anndata/build/anndata/_core/anndata.py',
'lineno': 1094,
'file': None,
'line': None,
'source': None,
'_category_name': 'FutureWarning'}
After reading the warnings, Debian is shipping pandas 1.1.3, and in that is_categorical is deprecated in favor of is_categorical_dtype, using this patch resolves the test failure, though I don't know if you'd rather accept the warnings to work with older versions of pandas.
Also as an aside it looks like there's more from pandas.api.types import is_categorical statements than is needed.
--- anndata/_core/anndata.py 2020-11-05 12:23:54.976471806 -0800
+++ /run/schroot/mount/unstable-amd64-sbuild-6f63c09f-36e0-4302-9409-6689c5b05354/build/python-anndata-exfcES/python-anndata-0.7.4+ds/anndata/_core/anndata.py 2020-11-05 15:28:33.517496264 -0800
@@ -19,5 +19,5 @@
from numpy import ma
import pandas as pd
-from pandas.api.types import is_string_dtype, is_categorical
+from pandas.api.types import is_string_dtype, is_categorical_dtype
from scipy import sparse
from scipy.sparse import issparse
@@ -1089,8 +1089,8 @@
def _remove_unused_categories(self, df_full, df_sub, uns):
- from pandas.api.types import is_categorical
+ from pandas.api.types import is_categorical_dtype
for k in df_full:
- if not is_categorical(df_full[k]):
+ if not is_categorical_dtype(df_full[k]):
continue
all_categories = df_full[k].cat.categories
@@ -1190,5 +1190,5 @@
key
for key in df.columns
- if is_string_dtype(df[key]) and not is_categorical(df[key])
+ if is_string_dtype(df[key]) and not is_categorical_dtype(df[key])
]
for key in string_cols:
Thanks for the report and diagnosis!
I believe we've already done this on master, but haven't made a release yet. I think we'll be making a bugfix release soon which will include this.
Just made that release, are your builds working now?
@ivirshup Hello! The build works on 64bit systems, but the tests fail on 32-bit systems.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=982098
=================================== FAILURES ===================================
___________________ test_set_dataframe[<lambda>-False-int32] ___________________
homogenous = False, df = <function <lambda> at 0xf1d74148>
dtype = <class 'numpy.int32'>
@pytest.mark.parametrize(
"df,homogenous,dtype",
[
(lambda: gen_typed_df_t2_size(*X.shape), True, np.object_),
(lambda: pd.DataFrame(X ** 2), False, np.int_),
],
)
def test_set_dataframe(homogenous, df, dtype):
adata = AnnData(X)
if homogenous:
with pytest.warns(UserWarning, match=r"Layer 'df'.*dtype object"):
adata.layers["df"] = df()
else:
with pytest.warns(None) as warnings:
adata.layers["df"] = df()
assert not len(warnings)
assert isinstance(adata.layers["df"], np.ndarray)
> assert np.issubdtype(adata.layers["df"].dtype, dtype)
E AssertionError: assert False
E + where False = <function issubdtype at 0xf60991d8>(dtype('int32'), <class 'numpy.int32'>)
E + where <function issubdtype at 0xf60991d8> = np.issubdtype
E + and dtype('int32') = array([[ 1, 4, 9],\n [16, 25, 36],\n [49, 64, 81]], dtype=int32).dtype
/usr/lib/python3/dist-packages/anndata/tests/test_layers.py:62: AssertionError
Perhaps this has been fixed and you can suggest a commit or PR that we can apply as a patch? Or a new release?
Thanks,
Not sure that's a bug on our end. That assertion looks like it should be fine, regardless of platform.
To me, it looks like there is a bug in whatever version of numpy that system is using?
This is just weird and supports the idea that it's something in numpy.
I ran this python3 -m pytest --pdb --pyargs anndata
and in the generated pdb session in a i386 chroot found this sequence in anndata/tests/test_layers.py:62 quite weird.
Module variables from test_layers.py:10
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
L = np.array([[10, 11, 12], [13, 14, 15], [16, 17, 18]])
(Pdb) p X.dtype, (X ** 2).dtype
(dtype('int32'), dtype('int32'))
(Pdb) p np.issubdtype(X.dtype, np.int_)
True
(Pdb) p np.issubdtype((X ** 2).dtype, np.int_)
False
(Pdb) p X.dtype is L.dtype
True
(Pdb) p X.dtype is (X ** 2).dtype
False
Yep this reproduces the issue and only depends on numpy.
It runs correctly on 64-bit intel and fails on 32-bit intel.
import numpy as np
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
assert np.issubdtype(X.dtype, np.integer)
assert np.issubdtype(X.dtype, np.int_)
assert np.issubdtype((X**2).dtype, np.integer)
assert np.issubdtype((X**2).dtype, np.int_)
on a i386 system:
Traceback (most recent call last):
File "/tmp/repro.py", line 8, in <module>
assert np.issubdtype((X**2).dtype, np.int_)
AssertionError
Hello,
We were trying packaging anndata 0.7.4 for debian-med and ran into some test failures (There's a minor issue where the Debian test process builds the package in a separate directory and runs tests there, and some of the small test data files aren't copied over so the tests fail. I added some setup(package_data) globs and fixed that, I can file a separate issue if you're interested)
The issue I'm more stumped on is this one: We're testing with python3.8 & 3.9 and get this exception with both versions of python. By any chance do you have any suggestions what might be wrong?