Closed VladimirShitov closed 1 year ago
Hey @VladimirShitov,
While I can't actually guarantee the current version of MuData supports reassigning modalities with intentionally defined behaviour, it does actually work and seems to work as expected. The multimodal object contains (1) modalities and (2) multimodal information. Individual modalities can be modified but for MuData to know about this (and to update the multimodal information accordingly), one has to explicitly do this by calling mdata.update()
(also see here). Given that, the following works:
from mudata import AnnData, MuData, read_h5mu
import numpy as np
import jax
x = np.array(jax.random.normal(jax.random.PRNGKey(1), (10, 100)))
adata = AnnData(x)
mdata = MuData({"one": adata})
mdata.shape # => (10, 100)
adata2 = adata[:,:20].copy()
mdata.mod["one"] = adata2
mdata.update()
mdata.shape # => (10, 20)
mdata.write("issue27.h5mu")
read_h5mu("issue27.h5mu").shape # => (10, 20)
When calling .write()
, .update()
should actually also be called automatically inside it so that unupdated information is not serialised.
With this, I cannot reproduce the issue with serialising mdata
— unless the feature names are changed before assignment. If they were, one should get an informative error message (at least on the latest master
):
adata2 = adata[:,:20].copy()
adata2.var_names = [f"var{i}" for i in adata2.var_names]
mdata.mod["one"] = adata2
mdata.update()
# => NotImplementedError: var_names seem to have been renamed and filtered at the same time. There is no way to restore the order.
That seems reasonable: with a different number of features and different names for them, MuData has no way of knowing which ones were kept after filtering data.
If one assigned it as another modality instead, it works as expected:
mdata = MuData({"one": adata})
mdata.mod["two"] = adata2
mdata.update()
mdata.shape # => (10, 120)
I hope that makes sense!
Thank you for the answer! Indeed, feature names have been changed in my case. I solved my problem anyway.
By the way, maybe it makes sense to run .update()
automatically every time when assigning something to modality? It can be done using @property
and a setter, though I am not sure how to do it with dictionary values.
Thanks @VladimirShitov! It should possible to wrap the dictionary of modalities into a custom class to do that but .update()
will still be required in other cases as MuData's attributes can't be updated automatically when AnnData objects change externally. We can definitely look into reducing the amount of .update()
calls required when we technically can though (incl. muon calls).
Just to note, (re)assigning modalities is not something encouraged by design in the current workflows (e.g. mdata[modality]
does not have a setter).
In case there are use cases that you find particularly useful and that would require changes like this to be implemented, feel free to share so that we can discuss them and figure out the best way to address them!
Describe the bug Imagine that you have Mudata object named
mdata
. You extract one modality from it to AnnData objectadata
, do something with it (e.g., subsetting genes) and then trying to storeadata
in the same modality it came from. It doesn't work as expected though. Then_vars
is inconsistent with actual number of variables:It also causes the problem with saving file:
Weirdly, after trying to save file and getting error, everything seems to be fixed:
To Reproduce Hope, the problem is clean from the screenshot. If you need the reproducible example, please write the comment about it
Expected behaviour Consistent number of variables and saving data from the first try
System