scverse / spatialdata

An open and interoperable data framework for spatial omics data
https://spatialdata.scverse.org/
BSD 3-Clause "New" or "Revised" License
235 stars 43 forks source link

Filter spatialData #280

Open lopollar opened 1 year ago

lopollar commented 1 year ago

Idea for enhancement: Add a filter option to the spatialdata object, where it is possible to filter out certain cells, based on a column in the table. This can be relevant to for example large cells. In line with the sc.pp.filter_cells. A similar function exists in spatialdata: sdata.pp.get_elements, but this does not work on obs or shapes information. This would then work on shapes and table simultaneously. Currently, I only get this to work by overwriting the objects, as shown here. Is there a better way?

for i in [*sdata.shapes]:
    sdata[i].index = sdata[i].index.astype("str")
    sdata.add_shapes(
        name=i,
        shapes=spatialdata.models.ShapesModel.parse(
            sdata[i][np.isin(sdata[i].index.values, sdata.table.obs.index.values)]
        ),
        overwrite=True,
    )
giovp commented 1 year ago

this is very much in the roadmap @lopollar , thanks for bringing it up!

I think this could be done in two different ways (both could be supported)

1)

sdata = sdata[sdata.table.obs.celltype == "celltypeA"]
>>> sdata.table.obs.celltype.unique()
["celltypeA"]

2) https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html

wdyt?

lopollar commented 1 year ago

Hi, For me, the first feels like how I wouldcode it myself, so I would prefer that one! Thank you for taking my suggestion into account!