scverse / anndata

Annotated data.
http://anndata.readthedocs.io
BSD 3-Clause "New" or "Revised" License
578 stars 154 forks source link

support gene panel selection/adaptation #1697

Open mojtababahrami opened 1 month ago

mojtababahrami commented 1 month ago

I come across this issue many times when I have two Anndata objects A and B and I want to change B to have the same vars as A by keeping the shared genes and inflating the new genes that are in A but not in B with a default value (e.g. zeros). I searched the API but did not find a function to do this. Now I do something like the following to achieve this:

adata = ad.concat([adata_train, adata_test], join='outer')
adata_test = adata[len(adata_train):]
adata_test = adata_test[:, adata_train.var.index]
assert (adata_train.var_names == adata_test.var_names).all()

It would be good to support it natively as I assume this is a very common use case (for example you train a model e.g. a simple PCA on A and you want to project/transform the B using that model).

So I imagine something like this:

adata_test.adapt_vars(adata_train, fill_value=0)
ilan-gold commented 1 month ago

Hmmm @mojtababahrami this is an interesting idea...I will need to think about it a bit. My initial reaction is that should be fairly straightforward but there may be some rough edges. Thanks for the issue!