scverse / spatialdata

An open and interoperable data framework for spatial omics data
https://spatialdata.scverse.org/
BSD 3-Clause "New" or "Revised" License
211 stars 40 forks source link

Missing filtered table in polygon_query() of MERFISH data #633

Open simonmfr opened 1 month ago

simonmfr commented 1 month ago

Using this tutorial for MERFISH data, I found that the output of polygon_query(sdata, my_napari_annotation) does not contain any filtered table, despite my_napari_annotation covering multiple cells. Is there something missing in my code?

With filter_table=False, I get the unfiltered table, which is not what I am looking for. This is similar to https://github.com/scverse/spatialdata/issues/532

Much appreciated!

Details:

# load

sdata = sio.merscope(full_paths, backend="rioxarray", slide_name="s1", region_name="r0")

sdata
SpatialData object
├── Points
│     └── 's1_r0_transcripts': DataFrame with shape: (<Delayed>, 9) (2D points)
├── Shapes
│     └── 's1_r0_polygons': GeoDataFrame shape: (96632, 9) (2D shapes)
└── Tables
      └── 'table': AnnData (96632, 500)
with coordinate systems:
    ▸ 'global', with elements:
        s1_r0_transcripts (Points), s1_r0_polygons (Shapes)

sdata.table.uns
OrderedDict([('spatialdata_attrs',
              {'region': 's1_r0_polygons',
               'region_key': 'cells_region',
               'instance_key': 'EntityID'})])

sdata["table"]
AnnData object with n_obs × n_vars = 96632 × 500
    obs: 'fov', 'volume', 'center_x', 'center_y', 'min_x', 'min_y', 'max_x', 'max_y', 'anisotropy', 'transcript_count', 'perimeter_area_ratio', 'solidity', 'DAPI_raw', 'DAPI_high_pass', 'PolyT_raw', 'PolyT_high_pass', 'region', 'slide', 'dataset_id', 'cells_region', 'EntityID'
    uns: 'spatialdata_attrs'
    obsm: 'blank', 'spatial'

print(sd.SpatialData.get_annotated_regions(sdata["table"]))
s1_r0_polygons
# annotate shape

interactive = napari_spatialdata.Interactive(sdata)

interactive.run() 

# export shape layer from Napari

shape = "overlap"

sdata[shape]
geometry
0   POLYGON ((5423.45 1777.948, 6398.625 1777.948,...

sdata[shape].geometry.iloc[0]
image
sdata
SpatialData object
├── Points
│     └── 's1_r0_transcripts': DataFrame with shape: (<Delayed>, 9) (2D points)
├── Shapes
│     ├── 'overlap': GeoDataFrame shape: (1, 1) (2D shapes)
│     └── 's1_r0_polygons': GeoDataFrame shape: (96632, 9) (2D shapes)
└── Tables
      └── 'table': AnnData (96632, 500)
with coordinate systems:
    ▸ 'global', with elements:
        s1_r0_transcripts (Points), overlap (Shapes), s1_r0_polygons (Shapes)
# sd.polygon_query

subset = sd.polygon_query(
    sdata, polygon=sdata[shape].geometry.iloc[0], target_coordinate_system="global", filter_table=True
)

subset
SpatialData object
└── Shapes
      └── 'overlap': GeoDataFrame shape: (1, 1) (2D shapes)
with coordinate systems:
    ▸ 'global', with elements:
        overlap (Shapes)

# subset does not contain filtered table
LucaMarconato commented 1 month ago

Hi, here the problem seems not to be related to the table but to the fact that no instance of 's1_r0_polygons' is returned after the polygon query (hence no table row associated to the returned instances).

Without the data it's difficult to understand what is going on; which version of the spatialdata and napari-spatialdata io are you using? Could you take some screenshot or even better record a short gif showing how you annotate the region with napari? Thank you.

simonmfr commented 1 month ago

Hi, thanks for the quick reply. Here's a gif showing how I create the manual annotations. I save the layer using shift+E. Does that help?

output2

package versions: anndata 0.10.8 napari 0.4.19.post1 napari_builtins 0.4.19.post1 napari_matplotlib 2.0.1 napari_plugin_engine 0.2.0 napari_spatialdata 0.5.0.post1 scanpy 1.10.2 spatialdata 0.2.1 spatialdata_io 0.1.3.post0

LucaMarconato commented 1 month ago

Thanks for sharing the recording. Here are my considerations.

These are the coordinates of the box that you drew:

Screenshot 2024-07-11 at 19 15 42 Screenshot 2024-07-11 at 19 16 18

These are instead the coordinates that you print in the notebook:

image

The mismatch could be in principle due to a bug in handing of coordinate transformations or in the query function, but here I think this is not the cause. I think the origin of the bug is that, as you can see from this screenshot:

image

you are getting an error when pressing Shift + E due to a bug, that seems to be stopping the saving when the object is not backed from disk. We will track the bug here: https://github.com/scverse/napari-spatialdata/issues/271.

I suggest to trying saving the SpatialData object by calling sdaa.write(your_path), and then reading it again with SpatialData.read(your_path) before calling Interactive(). The error above should disappear.

Also, a double check is to open again napari after you saved the shapes and closed it. If shape not where you drew it, then the bug is probably not in the query function; if the shape is where you drew it, probably it is.

simonmfr commented 1 month ago

Thank you, and sorry for the delay. Even when loading a saved SpatialData file with associated Zarr store I found that the output of sd.polygon_query() with filter_tables=True does not contain any table. I've now found a temporary workaround with other another software but I'm happy to assist you reproducing the bug.

LucaMarconato commented 2 weeks ago

Glad that you found a workaround. Yes, helping reproducing the bug would be helpful thank you.

Could you please use one of these 3 datasets? Thanks.

from spatialdata.datasets import blobs_annotating_element

sdata = blobs_annotating_element('blobs_polygons')
sdata = blobs_annotating_element('blobs_labels')
sdata = blobs_annotating_element('blobs_circles')