scverse / muon

muon is a multimodal omics Python framework
https://muon.scverse.org/
BSD 3-Clause "New" or "Revised" License
217 stars 30 forks source link

Can't save MuData object to h5mu file #57

Closed josenachorr closed 2 years ago

josenachorr commented 2 years ago

I created a MuData object that contains the AnnData for 2 modalities, did some basic filtering of the datasets and then tried to save it with: joint.write("joint_data.h5mu") but this throws the following error:

TypeError                                 Traceback (most recent call last)
/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    213         try:
--> 214             return func(elem, key, val, *args, **kwargs)
    215         except Exception as e:

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in write_elem(f, k, elem, modifiers, *args, **kwargs)
    174     else:
--> 175         _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)
    176 

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in get_writer(self, dest_type, typ, modifiers)
     63         if (dest_type, typ, modifiers) not in self.write:
---> 64             raise TypeError(
     65                 f"No method has been defined for writing {typ} elements to {dest_type}"

TypeError: No method has been defined for writing <class 'mudata._core.mudata.MuAxisArrays'> elements to <class 'h5py._hl.group.Group'>

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
/tmp/efec76988f/ipykernel_20272/4022115007.py in <module>
----> 1 joint.write("../Merged/929_cancer/929_cancer_joint_data.h5mu")

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/mudata.py in write_h5mu(self, filename, **kwargs)
   1084             raise ValueError("Provide a filename!")
   1085         else:
-> 1086             write_h5mu(filename, self, **kwargs)
   1087             if self.isbacked:
   1088                 self.file.filename = filename

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/io.py in write_h5mu(filename, mdata, **kwargs)
    207 
    208     with h5py.File(filename, "w", userblock_size=512) as f:
--> 209         _write_h5mu(f, mdata, **kwargs)
    210     with open(filename, "br+") as f:
    211         nbytes = f.write(

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/io.py in _write_h5mu(file, mdata, write_data, **kwargs)
     44         dataset_kwargs=kwargs,
     45     )
---> 46     write_attribute(file, "obsm", mdata.obsm, dataset_kwargs=kwargs)
     47     write_attribute(file, "varm", mdata.varm, dataset_kwargs=kwargs)
     48     write_attribute(file, "obsp", mdata.obsp, dataset_kwargs=kwargs)

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in write_attribute(*args, **kwargs)
    132         DeprecationWarning,
    133     )
--> 134     return write_elem(*args, **kwargs)
    135 
    136 

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    218             else:
    219                 parent = _get_parent(elem)
--> 220                 raise type(e)(
    221                     f"{e}\n\n"
    222                     f"Above error raised while writing key {key!r} of {type(elem)} "

TypeError: No method has been defined for writing <class 'mudata._core.mudata.MuAxisArrays'> elements to <class 'h5py._hl.group.Group'>

Above error raised while writing key 'obsm' of <class 'h5py._hl.files.File'> to /

I also tried to save only a MuData object with just the raw matrices (no more metadata), but it throws the same error, also when trying to save each of the modalities alone (in a MuData object with only 1 modality).

I am using python '3.8.12', scanpy '1.9.1' and muon '0.1.2'

Thank you for your help, this is a very useful tool.

gtca commented 2 years ago

Hey @josenachorr, thanks for reporting, which anndata version would that be? If this is the latest anndata release v0.8, MuData is not fully compatible with it just yet as but we'll make a corresponding release soon (see the progress in https://github.com/scverse/mudata/pull/8).

Please note this is expected to be fixed by an upgrade to the mudata library (https://github.com/scverse/mudata) as that's where the respective I/O code is.

gtca commented 2 years ago

@josenachorr, and in case you wanted to try that https://github.com/scverse/mudata/pull/8 PR out and let us know if it works for you, that would also be great of course!

matthew-levy commented 2 years ago

Besides not being able to write to h5mu files, are there any blatant issues with working with AnnData v0.8? I have several v0.8 files written from Scanpy that I intend to load into Muon and assign to the RNA aspect of the MuData object so I don't think I can install a previous version.

gtca commented 2 years ago

Only the I/O should be affected due to the changes in AnnData. https://github.com/scverse/mudata/pull/8 seems to pass the existing tests so I expect we'll merge it soon. You can also give it a try of course before it's merged, e.g. like this or with gh:

git clone https://github.com/scverse/mudata
cd mudata
gh pr checkout 8
pip install -e .
josenachorr commented 2 years ago

@gtca Thank you for your fast reply! My version of anndata is indeed 0.8.0, I think it's the one that comes by default with the latest version of scanpy. Unfortunately, I can't install the https://github.com/scverse/mudata/pull/8 in my environment (don't have permissions), so I'll just wait for the official update

matthew-levy commented 2 years ago

Only the I/O should be affected due to the changes in AnnData. scverse/mudata#8 seems to pass the existing tests so I expect we'll merge it soon. You can also give it a try of course before it's merged, e.g. like this or with gh:

git clone https://github.com/scverse/mudata
cd mudata
gh pr checkout 8
pip install -e .

I'm sorry, I'm unfamiliar with this process. How can I do this in Windows with my python installation via Anaconda?

gtca commented 2 years ago

@josenachorr and @matthew-levy, you should be able to give it a go with the master branch from GitHub now, e.g.:

pip install git+https://github.com/scverse/mudata
josenachorr commented 2 years ago

Thanks @gtca I could install it with no problem. Unfortunately, an error still occurs when trying to write the object (a different one this time):

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    213         try:
--> 214             return func(elem, key, val, *args, **kwargs)
    215         except Exception as e:

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in write_elem(f, k, elem, modifiers, *args, **kwargs)
    174     else:
--> 175         _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)
    176 

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in wrapper(g, k, *args, **kwargs)
     23         def wrapper(g, k, *args, **kwargs):
---> 24             result = func(g, k, *args, **kwargs)
     25             g[k].attrs.setdefault("encoding-type", spec.encoding_type)

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/methods.py in write_dataframe(f, key, df, dataset_kwargs)
    496         if reserved in df.columns:
--> 497             raise ValueError(f"{reserved!r} is a reserved name for dataframe columns.")
    498     group = f.create_group(key)

ValueError: '_index' is a reserved name for dataframe columns.

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
/tmp/697d46de78/ipykernel_2510/4022115007.py in <module>
----> 1 joint.write("../Merged/929_cancer/929_cancer_joint_data.h5mu")

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/mudata.py in write_h5mu(self, filename, **kwargs)
   1084             raise ValueError("Provide a filename!")
   1085         else:
-> 1086             write_h5mu(filename, self, **kwargs)
   1087             if self.isbacked:
   1088                 self.file.filename = filename

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/io.py in write_h5mu(filename, mdata, **kwargs)
    205 
    206     with h5py.File(filename, "w", userblock_size=512) as f:
--> 207         _write_h5mu(f, mdata, **kwargs)
    208     with open(filename, "br+") as f:
    209         nbytes = f.write(

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/io.py in _write_h5mu(file, mdata, write_data, **kwargs)
     69             write_elem(group, "X", adata.X, dataset_kwargs=kwargs)
     70         if adata.raw is not None:
---> 71             write_elem(group, "raw", adata.raw)
     72 
     73         write_elem(group, "obs", adata.obs, dataset_kwargs=kwargs)

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    212     def func_wrapper(elem, key, val, *args, **kwargs):
    213         try:
--> 214             return func(elem, key, val, *args, **kwargs)
    215         except Exception as e:
    216             if "Above error raised while writing key" in format(e):

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in write_elem(f, k, elem, modifiers, *args, **kwargs)
    173         )
    174     else:
--> 175         _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)
    176 
    177 

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in wrapper(g, k, *args, **kwargs)
     22         @wraps(func)
     23         def wrapper(g, k, *args, **kwargs):
---> 24             result = func(g, k, *args, **kwargs)
     25             g[k].attrs.setdefault("encoding-type", spec.encoding_type)
     26             g[k].attrs.setdefault("encoding-version", spec.encoding_version)

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/methods.py in write_raw(f, k, raw, dataset_kwargs)
    257     g = f.create_group(k)
    258     write_elem(g, "X", raw.X, dataset_kwargs=dataset_kwargs)
--> 259     write_elem(g, "var", raw.var, dataset_kwargs=dataset_kwargs)
    260     write_elem(g, "varm", dict(raw.varm), dataset_kwargs=dataset_kwargs)
    261 

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    218             else:
    219                 parent = _get_parent(elem)
--> 220                 raise type(e)(
    221                     f"{e}\n\n"
    222                     f"Above error raised while writing key {key!r} of {type(elem)} "

ValueError: '_index' is a reserved name for dataframe columns.

Above error raised while writing key 'var' of <class 'h5py._hl.group.Group'> to /
gtca commented 2 years ago

Hey @josenachorr,

I think this is an AnnData v0.8 thing. The following code causes the same error:

import numpy as np
from anndata import AnnData

x = np.random.normal(size=(10,20))
ad = AnnData(x, dtype=np.float32)
ad.obs["_index"] = "test"
ad.write("issue57.h5ad")
# => ValueError: '_index' is a reserved name for dataframe columns.
# => Above error raised while writing key 'obs' of <class 'h5py._hl.group.Group'> to /

I'll also tag @ivirshup for this.

gtca commented 2 years ago

I believe the issues related to muon / MuData raised here have been resolved. For the issue related to the _index column, I'll link https://github.com/scverse/anndata/issues/731 here as it might be related. Feel free to open new issues!

dburkhardt commented 2 years ago

Can we reopen this an pin the current version of mudata to an older version of anndata? This isn't resolved:

Files to reproduce 👇 data.zip

import anndata as ad
import mudata as mu

print("anndata version: " + str(ad.__version__))
print("mudata version: " + str(mu.__version__))

rna = ad.read_h5ad("./rna.small.h5ad")
atac = ad.read_h5ad("./atac.small.h5ad")

mdata = mu.MuData({'rna':rna, 'atac':atac})
mdata.write('mdata_minrep.h5mu')

outputs

anndata version: 0.8.0
mudata version: 0.1.2
Unexpected exception formatting exception. Falling back to standard exception

Traceback (most recent call last):
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/anndata/_io/utils.py", line 214, in func_wrapper
    f"Above error raised while writing key {key!r} of {type(elem)}"
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/anndata/_io/specs/registry.py", line 175, in write_elem
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/anndata/_io/specs/registry.py", line 64, in get_writer
TypeError: No method has been defined for writing <class 'mudata._core.mudata.MuAxisArrays'> elements to <class 'h5py._hl.group.Group'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3398, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_45899/4077421615.py", line 11, in <cell line: 11>
    mdata.write('mdata_minrep.h5mu')
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/mudata/_core/mudata.py", line 1086, in write_h5mu
    write_h5mu(filename, self, **kwargs)
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/mudata/_core/io.py", line 209, in write_h5mu
    _write_h5mu(f, mdata, **kwargs)
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/mudata/_core/io.py", line 46, in _write_h5mu
    write_attribute(file, "obsm", mdata.obsm, dataset_kwargs=kwargs)
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/anndata/_io/utils.py", line 134, in write_attribute
    # -------------------------------------------------------------------------------
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/anndata/_io/utils.py", line 220, in func_wrapper
TypeError: No method has been defined for writing <class 'mudata._core.mudata.MuAxisArrays'> elements to <class 'h5py._hl.group.Group'>

Above error raised while writing key 'obsm' of <class 'h5py._hl.files.File'> to /

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 1993, in showtraceback
    stb = self.InteractiveTB.structured_traceback(
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1118, in structured_traceback
    return FormattedTB.structured_traceback(
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1012, in structured_traceback
    return VerboseTB.structured_traceback(
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/IPython/core/ultratb.py", line 865, in structured_traceback
    formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/IPython/core/ultratb.py", line 818, in format_exception_as_a_whole
    frames.append(self.format_record(r))
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/IPython/core/ultratb.py", line 736, in format_record
    result += ''.join(_format_traceback_lines(frame_info.lines, Colors, self.has_colors, lvals))
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/stack_data/core.py", line 698, in lines
    pieces = self.included_pieces
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/stack_data/core.py", line 649, in included_pieces
    pos = scope_pieces.index(self.executing_piece)
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/stack_data/core.py", line 628, in executing_piece
    return only(
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/executing/executing.py", line 164, in only
    raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0
gtca commented 2 years ago

Hey @dburkhardt,

Thanks for making it very easy to run your use case for me! It works, and I get this output after running your code on your files with the current mudata master branch:

anndata version: 0.8.0
mudata version: 0.2.0

There's still a possibility I misunderstand your message but AnnData v0.8 brought forward incompatibility, which means that with anndata < 0.8 one can't read the files written with the new serialisation. As mudata is lean and reuses anndata I/O internals, its older versions can't use anndata >= 0.8 as the internals for serialisation were changed. That means that mudata >= 0.2 fixes the dependency as anndata >= 0.8. Meaning that upon installing mudata >= 0.2 (PyPI release will be there soon), a package manager should make sure that anndata is >= 0.8. If anndata is upgraded to a forward-incompatible version after mudata has been installed, there's not much we can do I think: mudata 0.1.2 specifies anndata < 0.8 as its dependency.

dburkhardt commented 2 years ago

Thanks @gtca! Can you please help me understand the reason why mudata 0.2 isn't on PyPI yet? Is there some reason we should just start using that today?

gtca commented 2 years ago

@dburkhardt, unless I'm missing something, it is though:

image
dburkhardt commented 2 years ago

Hmm okay, some folks on our team are still hitting this issue, I need to go check what versions they're using