[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of anndata.
[ ] (optional) I have confirmed this bug exists on the master branch of anndata.
Report
I was busy porting a unit test from anndataR to python to replicate an issue I was having in R, which is why the code looks a bit funky.
When an obsm or varm contains a column of strings with missing values, I get the following issue when writing an h5ad:
Code:
import anndata as ad
import pandas as pd
obs = pd.DataFrame(
index=[f"cell{i}" for i in range(1, 11)]
)
var = pd.DataFrame(
index=[f"gene{i}" for i in range(1, 21)]
)
obsm = dict(
characters_with_nas=pd.DataFrame(
index=obs.index,
data=dict(
characters_with_nas=[f"value{i}" if i in [1, 2, 5, 6, 9] else None for i in range(1, 11)]
)
)
)
varm = dict(
characters_with_nas=pd.DataFrame(
index=var.index,
data=dict(
characters_with_nas=[f"value{i}" if i in [1, 3, 4, 6, 7, 16, 17, 18, 19, 20] else None for i in range(1, 21)]
)
)
)
adata = ad.AnnData(
obs = obs,
var = var,
obsm = obsm,
varm = varm
)
adata.write_h5ad("anndata_to_hdf5_obsmvarm_character_with_nas.h5ad")
Traceback:
>>> adata.write_h5ad("anndata_to_hdf5_obsmvarm_character_with_nas.h5ad")
Traceback (most recent call last):
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 246, in func_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 311, in write_elem
return write_func(store, k, elem, dataset_kwargs=dataset_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 52, in wrapper
result = func(g, k, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/methods.py", line 359, in write_vlen_string_array
f.create_dataset(k, data=elem.astype(str_dtype), dtype=str_dtype, **dataset_kwargs)
File "/home/rcannood/.local/lib/python3.11/site-packages/h5py/_hl/group.py", line 183, in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcannood/.local/lib/python3.11/site-packages/h5py/_hl/dataset.py", line 166, in make_new_dset
dset_id.write(h5s.ALL, h5s.ALL, data)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 283, in h5py.h5d.DatasetID.write
File "h5py/_proxy.pyx", line 145, in h5py._proxy.dset_rw
File "h5py/_conv.pyx", line 444, in h5py._conv.str2vlen
File "h5py/_conv.pyx", line 95, in h5py._conv.generic_converter
File "h5py/_conv.pyx", line 249, in h5py._conv.conv_str2vlen
TypeError: Can't implicitly convert non-string objects to strings
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_core/anndata.py", line 1951, in write_h5ad
_write_h5ad(
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/h5ad.py", line 94, in write_h5ad
write_elem(f, "obsm", dict(adata.obsm), dataset_kwargs=dataset_kwargs)
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 353, in write_elem
Writer(_REGISTRY).write_elem(store, k, elem, dataset_kwargs=dataset_kwargs)
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 248, in func_wrapper
re_raise_error(e, elem, key)
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 246, in func_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 311, in write_elem
return write_func(store, k, elem, dataset_kwargs=dataset_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 52, in wrapper
result = func(g, k, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/methods.py", line 281, in write_mapping
_writer.write_elem(g, sub_k, sub_v, dataset_kwargs=dataset_kwargs)
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 248, in func_wrapper
re_raise_error(e, elem, key)
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 246, in func_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 311, in write_elem
return write_func(store, k, elem, dataset_kwargs=dataset_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 52, in wrapper
result = func(g, k, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/methods.py", line 579, in write_dataframe
_writer.write_elem(
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 248, in func_wrapper
re_raise_error(e, elem, key)
File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 229, in re_raise_error
raise type(e)(
TypeError: Can't implicitly convert non-string objects to strings
Above error raised while writing key 'characters_with_nas' of <class 'h5py._hl.group.Group'> to /
Interestingly, when I store the same info in obs and var, the same issue does not occur:
-----
anndata 0.9.2
pandas 2.0.3
session_info 1.0.0
-----
abrt_exception_handler3 NA
cython_runtime NA
dateutil 2.8.2
google NA
h5py 3.9.0
natsort 8.4.0
numpy 1.24.3
packaging 21.3
paste NA
pytz 2023.3
scipy 1.11.1
six 1.16.0
systemd NA
zope NA
-----
Python 3.11.4 (main, Jun 7 2023, 00:00:00) [GCC 12.3.1 20230508 (Red Hat 12.3.1-1)]
Linux-6.4.12-100.fc37.x86_64-x86_64-with-glibc2.36
-----
Session information updated at 2023-09-22 07:10
Please make sure these conditions are met
Report
I was busy porting a unit test from anndataR to python to replicate an issue I was having in R, which is why the code looks a bit funky.
When an obsm or varm contains a column of strings with missing values, I get the following issue when writing an h5ad:
Code:
Traceback:
Interestingly, when I store the same info in obs and var, the same issue does not occur:
Versions