Open xinyuejohn opened 1 year ago
Hi! Thanks for the feature request. I think that’s feasible, but I need to discuss this with @ivirshup and @ilan-gold. We need to formalize what the supported array types in all of anndata’s fields are.
I had hoped that this gets eventually solved with #244.
Back in the PR that introduced awkward arrays, we decided against implementing it in X (for now) as it would have required duplication of a lot of custom code. Checking the constraints on X is already a huge mess and adding the checks for awkward arrays makes it worse.
Personally, I'd suggest you set adata.X = None
and just put it in a layer.
@grst
Personally, I'd suggest you set adata.X = None and just put it in a layer.
This'd mean that people that load in complex EHR data will have an "empty" object. Yeah, everything is in a layer, but one needs to either always use the layer argument when doing stuff with it or copy it to X
which err doesn't work. Just not the nicest experience.
It'd also deviate from the rest of the scverse workflows where the working data is usually in X
.
I want everything in layers but scverse is not there yet.
It'd also deviate from the rest of the scverse workflows where the working data is usually in X.
In scirpy, X
is empty by default (unless you store paired gene expression in the same AnnData object, which is not recommended in favor of MuData). The TCR data is in .obsm
.
It of course depends on your interface, but at least in the scirpy case only very advanced users would want to interact with the awkward array directly. All others only access it through scirpy API calls (including a get
function to retreive some variables) and there you can just set appropriate default to get it from a layer
or obsm
.
I want everything in layers but scverse is not there yet.
and why repeat the old mistake for new packages
I would suggest you try working with it in layers
for now too. Most scverse workflows assume the data is in X
, but also most scverse workflows assume that X
and layers
contain matrix-like arrays with homogenous dtypes.
I would be interested in hearing how this goes.
I want everything in layers but scverse is not there yet.
and why repeat the old mistake for new packages
Because it builds upon scanpy which has the assumption that it works with X
by default.
But yeah, I could probably pass a default layer everywhere and modify that behavior.
Please describe your wishes and possible alternatives to achieve the desired result.
I'm thrilled to see that AnnData now supports awkward arrays. This feature has been incredibly useful. I'd like to inquire if there are plans to extend this support to the X of AnnData. Implementing this would significantly benefit our ongoing projects with ehrapy 2.0 (https://github.com/theislab/ehrapy) and EHRData.
To explain further, in our current use of AnnData with ehrapy, each patient is represented as a row with several variables. However, as shown in the figure below, some of the variables couldn't be fit into current X (numpy array) because they are lists-of-lists or lists-of-dicts. But users expect processing on these data, for example, getting statistics (min/max/avg), perform imputation, etc. So we don't want to save these variables in .layers, .obsm, or in .varm. Because it is not user-friendly and adds complexity to integrating this data into computational workflow.
Is there an estimated timeline for when we might expect this feature? Thanks for your continuous efforts in improving AnnData!