scverse / spatialdata

An open and interoperable data framework for spatial omics data
https://spatialdata.scverse.org/
BSD 3-Clause "New" or "Revised" License
224 stars 42 forks source link

proposal: use (dask/geo) DataFrames for shapes (and polygons?) #122

Open giovp opened 1 year ago

giovp commented 1 year ago

right now the Shapes model is an anndata. We are just saving the coordinates in adata.obsm["spatial"] and the metadata in uns. I feel this is overkill and we don't really have plans to extend this spec further (but we did discuss to potentially move it to geodataframe or as points -> dask dataframes). I'd suggest by release to convert Shapes to GeoDataFarme or DaskDataFrame

@scverse/spatialdata

LucaMarconato commented 1 year ago

I agree, we need to unify the types we use for elements, it's too heterogenous at the moment.

LucaMarconato commented 1 year ago

Wow maybe we have the perfect solution! And this applies also for polygons! https://github.com/geopandas/dask-geopandas I think GeoDataFrame (with dask) would be the best solution because we consider also Squares as Shapes.

giovp commented 1 year ago

I like the idea a lot but maybe it's also overkill for the moment? the shapes and polygons are in the order of 10-100_000 for the current datasets so plain geopandas might be enough?

LucaMarconato commented 1 year ago

I agree, from the performance point of view it may be overkill, but there is another implication of backing vs non-backing that we may be interested. If backing implies that when we modify something in-memory this is also modified in the disk (like it happens with h5py), then it would be desirable that all the elements are backed. Otherwise for certain elements (currently the table behaves like this), we need to rewrite the whole table so that the changes are reflected to the disk storage.

We have to test how the objects behave. Keeping track of that in this other issue: https://github.com/scverse/spatialdata/issues/126.

LucaMarconato commented 1 year ago

Not sure why I closed this 👀 It's still open, and duplicate of this (I remembered this discussion but couldn't find this issue anymore). https://github.com/scverse/spatialdata/issues/359