scverse / spatialdata

An open and interoperable data framework for spatial omics data
https://spatialdata.scverse.org/
BSD 3-Clause "New" or "Revised" License
235 stars 43 forks source link

Bounding box query vs polygon query for circles #669

Open LucaMarconato opened 3 months ago

LucaMarconato commented 3 months ago

Bounding box query selects circles whose center is contained inside the box, polygon query selects circles that intersect the box in any point.

You can reproduce with this code:

##
from spatialdata.datasets import blobs
from napari_spatialdata import Interactive
import matplotlib.pyplot as plt
import spatialdata_plot
import spatialdata as sd
from array import array
from shapely.geometry import Polygon
from geopandas import GeoDataFrame

plt.style.use('dark_background')

# define square to use as query geometry
x_coords = array("d", [243.331349706879, 348.864955663264, 348.8649556632623, 243.3313497068773, 243.331349706879])
y_coords = array("d", [392.5609882574645, 392.56098825746835, 214.4730282060675, 214.47302820606367, 392.5609882574645])

minx, maxx = min(x_coords), max(x_coords)
miny, maxy = min(y_coords), max(y_coords)

# rearrange to the vars needs to a bounding_box_query
min_coordinates = [minx, miny]
max_coordinates = [maxx, maxy]

# turn into shapely polygon
polygon = Polygon(zip(x_coords, y_coords))

# Print the results
print("Min coordinates:", min_coordinates)
print("Max coordinates:", max_coordinates)
print("Polygon:", polygon)

##
sdata = blobs().subset(["blobs_polygons", "blobs_circles", "blobs_image"])
sdata["box"] = sd.models.ShapesModel.parse(GeoDataFrame(geometry=[polygon]))

##
sdata.pl.render_images().pl.render_shapes("blobs_circles", color="red").pl.render_shapes(
    "blobs_polygons", color="white"
).pl.render_shapes("box", color="#FFFFFF66").pl.show()
plt.show()

## bounding box query
queried_bb = sdata.query.bounding_box(
    axes=("x", "y"), min_coordinate=min_coordinates, max_coordinate=max_coordinates, target_coordinate_system="global"
)

queried_bb.pl.render_images().pl.render_shapes("blobs_circles", color="red").pl.render_shapes(
    "blobs_polygons", color="white"
).pl.render_shapes("box", color="#FFFFFF66").pl.show()
plt.show()

## polygon query
queried_pol = sdata.query.polygon(polygon, target_coordinate_system='global')
queried_pol.pl.render_images().pl.render_shapes("blobs_circles", color="red").pl.render_shapes(
    "blobs_polygons", color="white"
).pl.render_shapes("box", color="#FFFFFF66").pl.show()
plt.show()

Which gives these three plots.

Full data

image

Bounding box query

image

Polygon query

image

LucaMarconato commented 3 months ago

A solution would be to benchmark the polygon query and bounding box implementation for points and shapes and see if one is significantly faster than the other. Ideally the polygon query is fast enough and we can drop the bounding box query implementation for points and shapes.

Note that polygon query implementation for raster types is simply calling the implementation of bounding box query for raster types; this will not change.

giovp commented 2 months ago

Ideally the polygon query is fast enough and we can drop the bounding box query implementation for points and shapes.

good point, I can see that it's faster with geopandas already vectorized methods, but would be good to benchmark indeed