scverse / spatialdata

An open and interoperable data framework for spatial omics data
https://spatialdata.scverse.org/
BSD 3-Clause "New" or "Revised" License
236 stars 45 forks source link

Faster version `aggregate()` method available #744

Open LucaMarconato opened 2 weeks ago

LucaMarconato commented 2 weeks ago

While extending SOPA for Visium HD data, @quentinblampey encountered a performance bottleneck with aggregate() that he could improve using pure geopandas code. Since we use geopandas internally, this could be a bug of spatialdata that may be easy to fix.

Here is the SOPA code https://github.com/gustaveroussy/sopa/blob/master/sopa/segmentation/aggregation.py#L485.

berombau commented 1 week ago

See Harpy aggregate implementation as mentioned by @ArneDefauw in #677.

ArneDefauw commented 1 week ago

See Harpy aggregate implementation as mentioned by @ArneDefauw in #677.

Just a side note, https://github.com/saeyslab/harpy/blob/6b80d01baa11c0ee9ecdfb48d5b0d72be305cb2e/src/sparrow/table/_allocation_intensity.py#L22 which uses https://github.com/saeyslab/harpy/blob/6b80d01baa11c0ee9ecdfb48d5b0d72be305cb2e/src/sparrow/utils/_aggregate.py#L16, which is more general, provides support for aggregation between labels layer and image layers, similar to xr_spatial.zonal_stats, but faster, and with support for custom aggregations https://github.com/saeyslab/harpy/blob/6b80d01baa11c0ee9ecdfb48d5b0d72be305cb2e/src/sparrow/utils/_aggregate.py#L251

I think https://github.com/gustaveroussy/sopa/blob/f1f5a99ee7f5a9489e511241a3a62bb520ec9860/sopa/segmentation/aggregation.py#L485 , focuses on aggregation between shapes layers and bins