scverse / spatialdata

An open and interoperable data framework for spatial omics data
https://spatialdata.scverse.org/
BSD 3-Clause "New" or "Revised" License
236 stars 45 forks source link

`polygon_query()` cannot filter the table for CODEX data #540

Closed GuoshengMa closed 7 months ago

GuoshengMa commented 7 months ago

Hi @LucaMarconato @melonora ! My spatialdata is CODEX data, and I want to get the table from the input polygon using polygon_query(), but the function does not return the table.

Here is my code:

image

image

image

I tried the same thing with my 10× visium data and it works. So I think it might be because the polygon_query() function cannot recognize the cell coordinate information in the table of CODEX data.

Here is the files of my CODEX diretory:

image

And here is the table of my CODEX data:

image

GuoshengMa commented 7 months ago

I think I've found the reason for this issue: there is no information like cell_boundaries in codex_sdata.shape.

To be specific, I noticed that you provide a codex example in the sandbox. I tried to download it but found that the lymphnode.tif' file cannot be fully downloaded, causing a read error (the complete file is 2.1 GB, but I always stop downloading automatically after reaching around 1 GB). Therefore, I directly used my data and read it according to the codex in to_zarr.py successfully. However, this process does not provide a shape layer related to spot_id or cell_boundaries as in visium orxenium. In addition, the codex_sdata.table object does not have the obsm: 'spatial' module, which stores the coordinates of spots and cells in the global coordinate system as in visium_sdata and xenium_sdata. I guess this is why polygon_query() cannot return the corresponding codex_sadata['table'], am I right?

melonora commented 7 months ago

We don't use obsm anymore so this should not be a problem. With the table that is present in codex_sdata could you check what kind of information you have in table.uns? For polygon_query to return rows specific to elements, the table must be annotating those particular elements. This is typically specified when parsing the table, but can also be specified later.

melonora commented 7 months ago

Also in case you want to ask a couple of questions and for us to help you out live I would also like to refer you to our community meetings. Schedule here: https://hackmd.io/enWU826vRai-JYaL7TZaSw. That way you can also get all your questions resolved:)

GuoshengMa commented 7 months ago

We don't use obsm anymore so this should not be a problem. With the table that is present in codex_sdata could you check what kind of information you have in table.uns? For polygon_query to return rows specific to elements, the table must be annotating those particular elements. This is typically specified when parsing the table, but can also be specified later.

I checked the table.uns of my visium and codex data, it seems complete, is there any issue here?:

image

GuoshengMa commented 7 months ago

Also in case you want to ask a couple of questions and for us to help you out live I would also like to refer you to our community meetings. Schedule here: https://hackmd.io/enWU826vRai-JYaL7TZaSw. That way you can also get all your questions resolved:)

Great! Thank you for the invitation! I will attend the community meeting on April 11th as scheduled:D

melonora commented 7 months ago

So from a quick glance I don't see 20230517-V52L11-399-A1-TMA3-8-outs as an element in codex_sdata. Is that correct?

GuoshengMa commented 7 months ago

So from a quick glance I don't see 20230517-V52L11-399-A1-TMA3-8-outs as an element in codex_sdata. Is that correct?

20230517-V52L11-399-A1-TMA3-8-outs is an element in visium st_sdata. For codex_sdata, this should be 'labels'.

image

melonora commented 7 months ago

In what coordinate system is the polygon when you assign it to the variable polygon? Did you set the data to be in coordinate system 3D2 before extracting it?

GuoshengMa commented 7 months ago

In what coordinate system is the polygon when you assign it to the variable polygon? Did you set the data to be in coordinate system 3D2 before extracting it?

I add a square in shape layer in 3D2 coordinate system:

image

Then I use polygon_query() to filter the codex_sdata:

image

melonora commented 7 months ago

ok if none of the others respond before Thursday, lets have a look then.

LucaMarconato commented 7 months ago

Hi @GuoshengMa, obsm['spatial'] is not available in general, as it was added just for the Xenium dataset for legacy reasons. You can see more about this here https://spatialdata.scverse.org/en/latest/tutorials/notebooks/notebooks/examples/squidpy_integration.html. In particular, the linked notebook we explain how to proceed in general for any dataset if you still need to have obsm['spatial'].

LucaMarconato commented 7 months ago

Regarding the other part of your question, the CODEX data does not contain cell_boundaries as this is something specific to Xenium data. In general, CODEX data does not contain polygonal elements as the cell geometries are encoded as a raster type (a "Labels" element).

Anyway, this should not be a problem because the polygon_query function is implemented for shapes, points, polygons and images. I would suggest to open again the data with napari-spatialdata and double check that after saving and reopening, the data is indeed under the box. If it is, I would call the get_extent() function on each specific element (the ROI, the image, the labels), specifying the desired coordinate system, and checking if the output are bounding boxes that intersect each others. If yes, the problem may be in polygon_query(), if not, the problem is somewhere else.

GuoshengMa commented 7 months ago

Regarding the other part of your question, the CODEX data does not contain cell_boundaries as this is something specific to Xenium data. In general, CODEX data does not contain polygonal elements as the cell geometries are encoded as a raster type (a "Labels" element).

Anyway, this should not be a problem because the polygon_query function is implemented for shapes, points, polygons and images. I would suggest to open again the data with napari-spatialdata and double check that after saving and reopening, the data is indeed under the box. If it is, I would call the get_extent() function on each specific element (the ROI, the image, the labels), specifying the desired coordinate system, and checking if the output are bounding boxes that intersect each others. If yes, the problem may be in polygon_query(), if not, the problem is somewhere else.

@LucaMarconato I tried the method you suggested, but still couldn't get the filtered table.

image

image

image

LucaMarconato commented 7 months ago

Thanks for the extra details, that's indeed unexpected. I will look more into this and try to reproduce. Also, in case you are interested we will have a community meeting today: https://scverse.zulipchat.com/#narrow/stream/315824-spatial/topic/spatialdata.20meetings/near/432649742.

LucaMarconato commented 7 months ago

Hi @GuoshengMa thanks for attending the community meeting. I have just sent you an email that fixes your problem. The table was not filtered because the labels object was not linked to the table. In your dataset the labels object was a black and white image showing cell boundaries; instead it is expected to have a tensor with filled cells and with values matching a numerical id of the cell. In the notebook I sent you by email I construct this new object from the information available in your data using flood_fill from skimage.