scverse / mudata

Multimodal Data (.h5mu) implementation for Python
https://mudata.rtfd.io
BSD 3-Clause "New" or "Revised" License
78 stars 17 forks source link

Writing `MuData` objects to .h5mu after using `mudata.concat()` #73

Closed IsaacUtah1379 closed 3 months ago

IsaacUtah1379 commented 3 months ago

Describe the bug After combining one or more MuData objects with the mudata.concat(), the resulting MuData object throws the following error when MuData.write() is called:

Error

The .h5mu file is still written despite the error. If the .h5mu file is read with mudata.read(), the resulting object appears to be the same as the original object, but I have not verified whether any data is changed or lost.

To Reproduce Example using simulated data:

import mudata
import anndata
import numpy as np

mudata.set_options(pull_on_update=False)

np.random.seed(1)

n, d, k = 1000, 100, 10

z = np.random.normal(loc=np.arange(k), scale=np.arange(k)*2, size=(n,k))
w = np.random.normal(size=(d,k))
y = np.dot(z, w.T)

adata = anndata.AnnData(y)
adata.obs_names = [f"obs_{i+1}" for i in range(n)]
adata.var_names = [f"var_{j+1}" for j in range(d)]

d2 = 50
w2 = np.random.normal(size=(d2,k))
y2 = np.dot(z, w2.T)

adata2 = anndata.AnnData(y2)
adata2.obs_names = [f"obs_{i+1}" for i in range(n)]
adata2.var_names = [f"var2_{j+1}" for j in range(d2)]

mdata = mudata.MuData({"A": adata, "B": adata2})

# This works without an error:
mdata.write('test_1.h5mu')

# Copy the two halves of mdata to separate variables, then recombine them
half_1 = mdata[:500,:].copy()
half_2 = mdata[500:,:].copy()
mdata2 = mudata.concat([half_1, half_2])

# This throws an error, but still writes the file:
mdata2.write('test_2.h5mu')

Expected behaviour The .h5mu file is written without any errors.

System

gtca commented 3 months ago

Thanks for spotting that, @IsaacUtah1379! Seems like np.str_ is not appreciated by h5py.