scverse / anndata

Annotated data.
http://anndata.readthedocs.io
BSD 3-Clause "New" or "Revised" License
577 stars 154 forks source link

can't write missing values in obsm and varm to h5ad #1146

Closed rcannood closed 11 months ago

rcannood commented 1 year ago

Please make sure these conditions are met

Report

I was busy porting a unit test from anndataR to python to replicate an issue I was having in R, which is why the code looks a bit funky.

When an obsm or varm contains a column of strings with missing values, I get the following issue when writing an h5ad:

Code:

import anndata as ad
import pandas as pd

obs = pd.DataFrame(
  index=[f"cell{i}" for i in range(1, 11)]
)
var = pd.DataFrame(
  index=[f"gene{i}" for i in range(1, 21)]
)
obsm = dict(
  characters_with_nas=pd.DataFrame(
    index=obs.index,
    data=dict(
      characters_with_nas=[f"value{i}" if i in [1, 2, 5, 6, 9] else None for i in range(1, 11)]
    )
  )
)
varm = dict(
  characters_with_nas=pd.DataFrame(
    index=var.index,
    data=dict(
      characters_with_nas=[f"value{i}" if i in [1, 3, 4, 6, 7, 16, 17, 18, 19, 20] else None for i in range(1, 21)]
    )
  )
)
adata = ad.AnnData(
  obs = obs,
  var = var,
  obsm = obsm,
  varm = varm
)
adata.write_h5ad("anndata_to_hdf5_obsmvarm_character_with_nas.h5ad")

Traceback:

>>> adata.write_h5ad("anndata_to_hdf5_obsmvarm_character_with_nas.h5ad")
Traceback (most recent call last):
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 246, in func_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 311, in write_elem
    return write_func(store, k, elem, dataset_kwargs=dataset_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 52, in wrapper
    result = func(g, k, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/methods.py", line 359, in write_vlen_string_array
    f.create_dataset(k, data=elem.astype(str_dtype), dtype=str_dtype, **dataset_kwargs)
  File "/home/rcannood/.local/lib/python3.11/site-packages/h5py/_hl/group.py", line 183, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/h5py/_hl/dataset.py", line 166, in make_new_dset
    dset_id.write(h5s.ALL, h5s.ALL, data)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 283, in h5py.h5d.DatasetID.write
  File "h5py/_proxy.pyx", line 145, in h5py._proxy.dset_rw
  File "h5py/_conv.pyx", line 444, in h5py._conv.str2vlen
  File "h5py/_conv.pyx", line 95, in h5py._conv.generic_converter
  File "h5py/_conv.pyx", line 249, in h5py._conv.conv_str2vlen
TypeError: Can't implicitly convert non-string objects to strings

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_core/anndata.py", line 1951, in write_h5ad
    _write_h5ad(
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/h5ad.py", line 94, in write_h5ad
    write_elem(f, "obsm", dict(adata.obsm), dataset_kwargs=dataset_kwargs)
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 353, in write_elem
    Writer(_REGISTRY).write_elem(store, k, elem, dataset_kwargs=dataset_kwargs)
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 248, in func_wrapper
    re_raise_error(e, elem, key)
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 246, in func_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 311, in write_elem
    return write_func(store, k, elem, dataset_kwargs=dataset_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 52, in wrapper
    result = func(g, k, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/methods.py", line 281, in write_mapping
    _writer.write_elem(g, sub_k, sub_v, dataset_kwargs=dataset_kwargs)
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 248, in func_wrapper
    re_raise_error(e, elem, key)
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 246, in func_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 311, in write_elem
    return write_func(store, k, elem, dataset_kwargs=dataset_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 52, in wrapper
    result = func(g, k, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/methods.py", line 579, in write_dataframe
    _writer.write_elem(
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 248, in func_wrapper
    re_raise_error(e, elem, key)
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 229, in re_raise_error
    raise type(e)(
TypeError: Can't implicitly convert non-string objects to strings

Above error raised while writing key 'characters_with_nas' of <class 'h5py._hl.group.Group'> to /

Interestingly, when I store the same info in obs and var, the same issue does not occur:

adata2 = ad.AnnData(
  obs = obsm["characters_with_nas"],
  var = varm["characters_with_nas"]
)
adata2.write_h5ad("anndata_to_hdf5_obsvar_character_with_nas.h5ad")

Versions

-----
anndata             0.9.2
pandas              2.0.3
session_info        1.0.0
-----
abrt_exception_handler3     NA
cython_runtime              NA
dateutil                    2.8.2
google                      NA
h5py                        3.9.0
natsort                     8.4.0
numpy                       1.24.3
packaging                   21.3
paste                       NA
pytz                        2023.3
scipy                       1.11.1
six                         1.16.0
systemd                     NA
zope                        NA
-----
Python 3.11.4 (main, Jun  7 2023, 00:00:00) [GCC 12.3.1 20230508 (Red Hat 12.3.1-1)]
Linux-6.4.12-100.fc37.x86_64-x86_64-with-glibc2.36
-----
Session information updated at 2023-09-22 07:10
c-westhoven commented 1 year ago

similar issue described in #1141 and #1143 and #1068

flying-sheep commented 11 months ago

Also scverse/scanpy#1651. Let’s track this in #1068, which contains both a reproducer and discussion of the solution