scverse / anndata

Annotated data.
http://anndata.readthedocs.io
BSD 3-Clause "New" or "Revised" License
575 stars 152 forks source link

Add API to move values from X to obs/var/obsm/varm and vice versa #655

Open Zethson opened 2 years ago

Zethson commented 2 years ago

Hi,

for ehrapy we need the ability to move values from X to obs/var/obsm/varm and the other way around. Quoting myself:

Would be very useful to have a method to move data from X to obs and the other way around. I am sure that when loading complex data one sometimes forgets to add a column to "obs_only" and instead of loading the complete (large) file again it would help to just move columns.

@ivirshup already kindly drafted an API for this:

Drafts: ```python from typing import Union import anndata as ad, numpy as np, pandas as pd def split_out(adata: ad.AnnData, idx: "np.ndarray[1, bool]", *, axis=1): idxs = [slice(None), slice(None)] idxs[axis] = idx idxs = tuple(idxs) df = adata[idxs].to_df() if axis == 1: adata._inplace_subset_var(~idx) adata.obs = adata.obs.join(df) elif axis == 0: adata._inplace_subset_obs(~idx) adata.var = adata.var.join(df) def splice_in( adata: ad.AnnData, *, obs: Union[str, list[str]]=None, var: Union[str, list[str]]=None, ) -> ad.AnnData: assert (obs is None) + (var is None) == 1 if obs is not None: if isinstance(obs, str): obs = [obs] res = ad.concat([adata, ad.AnnData(adata.var[obs])], axis=0) res.var.drop(columns=obs, inplace=True) return res elif var is not None: if isinstance(var, str): var = [var] res = ad.concat([adata, ad.AnnData(adata.obs[var])], axis=1) res.obs.drop(columns=var, inplace=True) return res from anndata.tests.helpers import gen_adata a = gen_adata((20, 10)) b = gen_adata((20, 5)) c = ad.concat({"a": a, "b": b}, axis=1, index_unique="-", label="vartype") d = c.copy() d ``` ``` AnnData object with n_obs × n_vars = 20 × 15 var: 'var_cat', 'cat_ordered', 'int64', 'float64', 'uint8', 'vartype' varm: 'array', 'sparse', 'df' layers: 'array', 'sparse' ``` ```python removed_var = c.var_names[c.var["vartype"] == "b"] split_out(d, d.var_names.isin(removed_var)) # convert removed_var to mask d ``` ``` AnnData object with n_obs × n_vars = 20 × 10 obs: 'gene0-b', 'gene1-b', 'gene2-b', 'gene3-b', 'gene4-b' var: 'var_cat', 'cat_ordered', 'int64', 'float64', 'uint8', 'vartype' varm: 'array', 'sparse', 'df' layers: 'array', 'sparse' ``` ```python splice_in(d, var=removed_var) ``` ``` AnnData object with n_obs × n_vars = 20 × 15 ```

@ivirshup would you be up to implementing this yourself or should @imipenem have a go at this?

CC @giovp @mbuttner because I was told that this might be useful for you.

Cheers

giovp commented 2 years ago

Yeah definitely useful for us, we needed something similar for plotting and ended up writing a small "extract" function to create an anndata in place with features from obsm: https://squidpy.readthedocs.io/en/latest/api/squidpy.pl.extract.html

an api for that would be ofc way more useful and robust in the long term!

ivirshup commented 2 years ago

@Zethson It'd probably happen sooner if someone else implements it! But would like to talk API here first.

ivirshup commented 2 years ago

Related discussion in ehrapy https://github.com/theislab/ehrapy/issues/151

giovp commented 2 years ago

this is really cool

Zethson commented 2 years ago

It is. But just to make the expectations clear. We are implementing a quick and dirty version in ehrapy because we need to move fast in the beginning. We might backport this to AnnData at a later point, but will in any case help with a proper API draft.

ivirshup commented 2 years ago

Pointing out a very very quick version (https://github.com/theislab/ehrapy/issues/151#issuecomment-984956216) for anyone who wants something to try out while thinking about the API.

ivirshup commented 1 year ago

@Zethson, I recall this coming up with some ehrapy devs. Was anyone up for making a PR/ pushing this issue?

Zethson commented 1 year ago

Yeah, I can see us contributing this back after the ehrapy manuscipt is out. I'll revisit this in ~2 months.

ivirshup commented 2 months ago

@Zethson is this still a feature you're interested in?

Zethson commented 2 months ago

Yes.