Closed ddemaeyer closed 3 years ago
What kind of dtype
is in .var["highly_variable"]
? The error makes me suspect object
, but it'd be good to confirm.
If it is object
, are all of the values boolean, and do you know how you got them?
@grst, any insight?
Hi both,
the issue should be solved (or rather worked around by converting everything to categoricals) in the latest development version of scirpy already. I'll release v0.7 probably next week.
As far as I can tell, the error arose with h5py==3
. In 2.x everything that couldn't be serialized to hdf5 was just silently converted to a str
, now an error is raised (see also https://github.com/theislab/anndata/issues/493).
Edit: #504 also seems related.
Here's a minimal reproducible example
import anndata
import pandas as pd
import h5py
anndata.__version__
'0.7.6'
h5py.__version__
'3.2.1'
ad = anndata.AnnData(
obs=pd.DataFrame().assign(foo=[True, False, None], bar=[True, True, False])
)
ad.obs["bar"].dtype
dtype('bool')
ad.obs["foo"].dtype
dtype('O')
ad.write_h5ad("/tmp/test.h5ad")
We are indeed merging different datasets together in this component on the var slot. This results in the following dtypes on the var slots
gene_ids object
feature_types category
genome category
highly_variable object
means float64
dispersions float64
dispersions_norm float32
The problem is that during the merge we have a var column that does not exist in a dataset. Therefore adding with pandas.concat results in an object type instead of bool. As a quick fix I tried using the convert_dtypes() functionality but this results in setting the boolean dtype which does not solve the issue either.
For the time being I'm going to pin the version of anndata for these components.
I seem to get the same issue with boolean variables, NaN values, integers:
scanpy 1.8.1 anndata 0.7.6 pandas 1.3.3 numpy 1.20.3 h5py 3.4.0
Closing as duplicate of #504
When loading an h5ad and combining it with various annotations, the object is saved as:
This results in the following exception
This is specific for version 0.7.6 as the build tests that we perform do work for the same data on 0.7.5. Please note that this also impacts the scirpy package as the following code also returns a similar exception, just on a different column: