scverse / anndata

Annotated data.
http://anndata.readthedocs.io
BSD 3-Clause "New" or "Revised" License
579 stars 154 forks source link

Upgrading from 0.7.5 to 0.7.6 throws TypeError when saving h5ad #558

Closed ddemaeyer closed 3 years ago

ddemaeyer commented 3 years ago

When loading an h5ad and combining it with various annotations, the object is saved as:

mergedData.write(par["output"], compression = "gzip")     

This results in the following exception

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
    return func(elem, key, val, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 270, in write_series
    group.create_dataset(
  File "/usr/local/lib/python3.8/site-packages/h5py/_hl/group.py", line 148, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/usr/local/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 140, in make_new_dset
    dset_id.write(h5s.ALL, h5s.ALL, data)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 232, in h5py.h5d.DatasetID.write
  File "h5py/_proxy.pyx", line 145, in h5py._proxy.dset_rw
  File "h5py/_conv.pyx", line 444, in h5py._conv.str2vlen
  File "h5py/_conv.pyx", line 95, in h5py._conv.generic_converter
  File "h5py/_conv.pyx", line 249, in h5py._conv.conv_str2vlen
TypeError: Can't implicitly convert non-string objects to strings

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
    return func(elem, key, val, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 263, in write_dataframe
    write_series(group, col_name, series, dataset_kwargs=dataset_kwargs)
  File "/usr/local/lib/python3.8/site-packages/anndata/_io/utils.py", line 212, in func_wrapper
    raise type(e)(
TypeError: Can't implicitly convert non-string objects to strings

Above error raised while writing key 'highly_variable' of <class 'h5py._hl.group.Group'> from /.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/viash-run-mergeabtogex-GbM6xh", line 73, in <module>
    mergedData.write(par["output"], compression = "gzip")     
  File "/usr/local/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1905, in write_h5ad
    _write_h5ad(
  File "/usr/local/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 112, in write_h5ad
    write_attribute(f, "var", adata.var, dataset_kwargs=dataset_kwargs)
  File "/usr/local/lib/python3.8/functools.py", line 875, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/usr/local/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 130, in write_attribute_h5ad
    _write_method(type(value))(f, key, value, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/anndata/_io/utils.py", line 212, in func_wrapper
    raise type(e)(
TypeError: Can't implicitly convert non-string objects to strings

Above error raised while writing key 'highly_variable' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'var' of <class 'h5py._hl.files.File'> from /.

This is specific for version 0.7.6 as the build tests that we perform do work for the same data on 0.7.5. Please note that this also impacts the scirpy package as the following code also returns a similar exception, just on a different column:

import anndata
import scanpy as sc
import scirpy

data = sc.read_h5ad(par["input"])
contigs = scirpy.io.read_10x_vdj(par["input_vdj"])

scirpy.pp.merge_with_ir(data, contigs)

data.write(par["output"], compression = "gzip")
ivirshup commented 3 years ago

What kind of dtype is in .var["highly_variable"]? The error makes me suspect object, but it'd be good to confirm.

If it is object, are all of the values boolean, and do you know how you got them?

ivirshup commented 3 years ago

@grst, any insight?

grst commented 3 years ago

Hi both,

the issue should be solved (or rather worked around by converting everything to categoricals) in the latest development version of scirpy already. I'll release v0.7 probably next week.

As far as I can tell, the error arose with h5py==3. In 2.x everything that couldn't be serialized to hdf5 was just silently converted to a str, now an error is raised (see also https://github.com/theislab/anndata/issues/493).

Edit: #504 also seems related.


Here's a minimal reproducible example

import anndata
import pandas as pd
import h5py
anndata.__version__
'0.7.6'
h5py.__version__
'3.2.1'
ad = anndata.AnnData(
    obs=pd.DataFrame().assign(foo=[True, False, None], bar=[True, True, False])
)
ad.obs["bar"].dtype
dtype('bool')
ad.obs["foo"].dtype
dtype('O')
ad.write_h5ad("/tmp/test.h5ad")
Traceback ```pytb --------------------------------------------------------------------------- TypeError Traceback (most recent call last) ~/anaconda3/envs/sctcrpy2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs) 208 try: --> 209 return func(elem, key, val, *args, **kwargs) 210 except Exception as e: ~/anaconda3/envs/sctcrpy2/lib/python3.8/site-packages/anndata/_io/h5ad.py in write_series(group, key, series, dataset_kwargs) 269 if series.dtype == object: # Assuming it’s string --> 270 group.create_dataset( 271 key, ~/anaconda3/envs/sctcrpy2/lib/python3.8/site-packages/h5py/_hl/group.py in create_dataset(self, name, shape, dtype, data, **kwds) 147 --> 148 dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds) 149 dset = dataset.Dataset(dsid) ~/anaconda3/envs/sctcrpy2/lib/python3.8/site-packages/h5py/_hl/dataset.py in make_new_dset(parent, shape, dtype, data, name, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl, allow_unknown_filter) 139 if (data is not None) and (not isinstance(data, Empty)): --> 140 dset_id.write(h5s.ALL, h5s.ALL, data) 141 h5py/_objects.pyx in h5py._objects.with_phil.wrapper() h5py/_objects.pyx in h5py._objects.with_phil.wrapper() h5py/h5d.pyx in h5py.h5d.DatasetID.write() h5py/_proxy.pyx in h5py._proxy.dset_rw() h5py/_conv.pyx in h5py._conv.str2vlen() h5py/_conv.pyx in h5py._conv.generic_converter() h5py/_conv.pyx in h5py._conv.conv_str2vlen() TypeError: Can't implicitly convert non-string objects to strings The above exception was the direct cause of the following exception: TypeError Traceback (most recent call last) ~/anaconda3/envs/sctcrpy2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs) 208 try: --> 209 return func(elem, key, val, *args, **kwargs) 210 except Exception as e: ~/anaconda3/envs/sctcrpy2/lib/python3.8/site-packages/anndata/_io/h5ad.py in write_dataframe(f, key, df, dataset_kwargs) 262 for col_name, (_, series) in zip(col_names, df.items()): --> 263 write_series(group, col_name, series, dataset_kwargs=dataset_kwargs) 264 ~/anaconda3/envs/sctcrpy2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs) 211 parent = _get_parent(elem) --> 212 raise type(e)( 213 f"{e}\n\n" TypeError: Can't implicitly convert non-string objects to strings Above error raised while writing key 'foo' of from /. The above exception was the direct cause of the following exception: TypeError Traceback (most recent call last) in ----> 1 ad.write_h5ad("/tmp/test.h5ad") ~/anaconda3/envs/sctcrpy2/lib/python3.8/site-packages/anndata/_core/anndata.py in write_h5ad(self, filename, compression, compression_opts, force_dense, as_dense) 1903 filename = self.filename 1904 -> 1905 _write_h5ad( 1906 Path(filename), 1907 self, ~/anaconda3/envs/sctcrpy2/lib/python3.8/site-packages/anndata/_io/h5ad.py in write_h5ad(filepath, adata, force_dense, as_dense, dataset_kwargs, **kwargs) 109 else: 110 write_attribute(f, "raw", adata.raw, dataset_kwargs=dataset_kwargs) --> 111 write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs) 112 write_attribute(f, "var", adata.var, dataset_kwargs=dataset_kwargs) 113 write_attribute(f, "obsm", adata.obsm, dataset_kwargs=dataset_kwargs) ~/anaconda3/envs/sctcrpy2/lib/python3.8/functools.py in wrapper(*args, **kw) 873 '1 positional argument') 874 --> 875 return dispatch(args[0].__class__)(*args, **kw) 876 877 funcname = getattr(func, '__name__', 'singledispatch function') ~/anaconda3/envs/sctcrpy2/lib/python3.8/site-packages/anndata/_io/h5ad.py in write_attribute_h5ad(f, key, value, *args, **kwargs) 128 if key in f: 129 del f[key] --> 130 _write_method(type(value))(f, key, value, *args, **kwargs) 131 132 ~/anaconda3/envs/sctcrpy2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs) 210 except Exception as e: 211 parent = _get_parent(elem) --> 212 raise type(e)( 213 f"{e}\n\n" 214 f"Above error raised while writing key {key!r} of {type(elem)}" TypeError: Can't implicitly convert non-string objects to strings Above error raised while writing key 'foo' of from /. Above error raised while writing key 'obs' of from /. ```
ddemaeyer commented 3 years ago

We are indeed merging different datasets together in this component on the var slot. This results in the following dtypes on the var slots

gene_ids              object
feature_types       category
genome              category
highly_variable       object
means                float64
dispersions          float64
dispersions_norm     float32

The problem is that during the merge we have a var column that does not exist in a dataset. Therefore adding with pandas.concat results in an object type instead of bool. As a quick fix I tried using the convert_dtypes() functionality but this results in setting the boolean dtype which does not solve the issue either.

For the time being I'm going to pin the version of anndata for these components.

vitkl commented 3 years ago

I seem to get the same issue with boolean variables, NaN values, integers:

Screenshot 2021-10-29 at 17 39 05

scanpy 1.8.1 anndata 0.7.6 pandas 1.3.3 numpy 1.20.3 h5py 3.4.0

ivirshup commented 3 years ago

Closing as duplicate of #504