scverse / spatialdata

An open and interoperable data framework for spatial omics data
https://spatialdata.scverse.org/
BSD 3-Clause "New" or "Revised" License
215 stars 42 forks source link

Channel Equivalent For `Labels2DModel` #338

Open srivarra opened 1 year ago

srivarra commented 1 year ago

Thanks for developing SpatialData! It's been a very ergonomic experience so far trying to convert a small subset of our lab's current multiplexed imaging pipeline as a MVP for testing.

One particular wishlist item would be an $N$-dimensional implementation of Labels2D, as for each SpatialImage a user may want to have several types of labels, such as:

Here is what I have currently:

Current `sdata` setup ```html SpatialData object with: ├── Images │ ├── 'fov0': SpatialImage[cyx] (22, 512, 512) │ ├── 'fov1': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov2': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov3': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov4': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov5': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov6': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov7': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov8': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov9': SpatialImage[cyx] (22, 1024, 1024) │ └── 'fov10': SpatialImage[cyx] (22, 1024, 1024) └── Labels ├── 'fov0_nuclear': SpatialImage[yx] (512, 512) ├── 'fov0_whole_cell': SpatialImage[yx] (512, 512) ├── 'fov1_nuclear': SpatialImage[yx] (1024, 1024) ├── 'fov1_whole_cell': SpatialImage[yx] (1024, 1024) ├── 'fov2_nuclear': SpatialImage[yx] (1024, 1024) ├── 'fov2_whole_cell': SpatialImage[yx] (1024, 1024) ├── 'fov3_nuclear': SpatialImage[yx] (1024, 1024) ├── 'fov3_whole_cell': SpatialImage[yx] (1024, 1024) ├── 'fov4_nuclear': SpatialImage[yx] (1024, 1024) ├── 'fov4_whole_cell': SpatialImage[yx] (1024, 1024) ├── 'fov5_nuclear': SpatialImage[yx] (1024, 1024) ├── 'fov5_whole_cell': SpatialImage[yx] (1024, 1024) ├── 'fov6_nuclear': SpatialImage[yx] (1024, 1024) ├── 'fov6_whole_cell': SpatialImage[yx] (1024, 1024) ├── 'fov7_nuclear': SpatialImage[yx] (1024, 1024) ├── 'fov7_whole_cell': SpatialImage[yx] (1024, 1024) ├── 'fov8_nuclear': SpatialImage[yx] (1024, 1024) ├── 'fov8_whole_cell': SpatialImage[yx] (1024, 1024) ├── 'fov9_nuclear': SpatialImage[yx] (1024, 1024) ├── 'fov9_whole_cell': SpatialImage[yx] (1024, 1024) ├── 'fov10_nuclear': SpatialImage[yx] (1024, 1024) └── 'fov10_whole_cell': SpatialImage[yx] (1024, 1024) ```

Ideally something like this would be great:

`sdata` with `C` equivalent ```html SpatialData object with: ├── Images │ ├── 'fov0': SpatialImage[cyx] (22, 512, 512) │ ├── 'fov1': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov2': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov3': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov4': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov5': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov6': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov7': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov8': SpatialImage[cyx] (22, 1024, 1024) │ ├── 'fov9': SpatialImage[cyx] (22, 1024, 1024) │ └── 'fov10': SpatialImage[cyx] (22, 1024, 1024) └── Labels ├── 'fov0': SpatialImage[cyx] (2, 512, 512) ├── 'fov1': SpatialImage[cyx] (2, 1024, 1024) ├── 'fov2': SpatialImage[cyx] (2, 1024, 1024) ├── 'fov3': SpatialImage[cyx] (2, 1024, 1024) ├── 'fov4': SpatialImage[cyx] (2, 1024, 1024) ├── 'fov5': SpatialImage[cyx] (2, 1024, 1024) ├── 'fov6': SpatialImage[cyx] (2, 1024, 1024) ├── 'fov7': SpatialImage[cyx] (2, 1024, 1024) ├── 'fov8': SpatialImage[cyx] (2, 1024, 1024) ├── 'fov9': SpatialImage[cyx] (2, 1024, 1024) └── 'fov10': SpatialImage[cyx] (2, 1024, 1024) ```

This would be pretty convenient when, say indexing a particular Spatial Image with along with 1,2, or $n$ potential label masks.

LucaMarconato commented 1 year ago

Hi @srivarra, thank you for the feedback and for reaching out!

The use case that you mentioned is important and we are happy to explore improvements to the library to support it. I'll describe the rationale behind the current implementation, the recommended workflow and possible improvements.

The current implementation

During design we decided to omit the c from labels (and so to not support multiple alternative labelings for the same fov within the same labels tensor) because of the implication that this would have when linking an external table to the same object. We wanted in fact to keep the proposal for a table annotation as easy as possible, in order to make storage simple and reduce the branching in the context of interoperability between tools.

Similarly, we also decided not to represent a direct link between an image and a label (i.e. "these labels are for this image"), not to limit the user in how labels are used and to be able to apply the same labels also to images that partially overlap or have different resolutions.

A final note, the same spatialdata object does not support using the same name on different spatial elements. The second object that you posted would trigger an error during the validation of the elements.

Recommended workflow

The way that we recommend dealing with your use case would be to create a coordinate system for each fov and map every element that belongs to that fov to the same coordinate system. An example of this is shown in this file https://github.com/scverse/spatialdata-io/blob/main/src/spatialdata_io/readers/cosmx.py.

This approach, combined with the use of the rasterize() function, would allow even to map labels and images that have different resolution and don't fully overlap (in your case you don't need to use rasterize() because you have full overlap and the same resolution).

Another important aspects to consider is to being able to add annotations to multiple instances of the various labels. This is currently possible by using a single table, which can be mapped to different objects (example in the cosmx.py file mentioned above), using the region, region_key and instance_key metadata. Anyway, as multiple users reported to use that the use of this metadata is confusion, we are working on improving the ergonomics of it and working on supporting multiple annotations tables. In this way one could have, for each different labels object, a diverse set of annotations stored as linked tables. The new table design is described here https://github.com/scverse/spatialdata/issues/298 and we will work on it together with external collaborators in 3 weeks.

Possible improvements

What do you think of the above workflows? Would they work for you or do you think that we could improve something? Also, we will keep in mind the request of adding c for labels, so that if in the future other users or developer ask for this, we can have a joint discussion about a possible modification of the design; but for the moment I would keep it as it is not to complicate the link between labels and tables.

srivarra commented 1 year ago

I think that coordinate system approach would work. I remember I tried it a little while ago, and couldn't figure it out without having the table. And the new table design sounds very nice.

Currently to iterate over FOVs and their associated labels, I ripped the accessor code from:

and wrote some custom iterator accessors.

Is it possible to create coordinate systems without a table, or should they be created after all the data is acquired.

For example, here is my current workflow:

  1. Load in tiffs and convert them to Images2D.
  2. Save the loaded data as a ome.zarr store.
  3. Iterate over each fov and generate the segmentation masks
  4. Iterate over each fov and it's associated segmentation mask and run scikit-image's regionprops (this gives me the equivalent to AnnData.obs from the technology_mibitof.ipynb notebook). Then I use aggregate to sum the signal in w.r.t the segmentation mask. a. This returns a SpatialData object with AnnData.X and then I can concatenate all the AnnData tables.

And I create the coordinate system after this step correct?

Thank you for the detailed response!

LucaMarconato commented 1 year ago

I think that coordinate system approach would work. I remember I tried it a little while ago, and couldn't figure it out without having the table.

Happy to clarify the process. The coordinate system is designed to work also without the table: by using the function spatialdata.transformations.set_transformation, you can associate a coordinate transformation to any spatial element. The information is stored in the object metadata (for instance for points, shapes and single-scale raster types is in my_element.attrs) as a dict. Keys of the dict are the name (str) of the coordinate system, and the values (BaseTransformation) is the actual coordinate transformation. You can find example of this in the transformation notebook from the docs.

A few more advanced notes on this. Currently transformations are only between elements and coordinate systems, but with a new refactoring that will come in the next months also this part of the code will be simplified and made more general. The new refactoring will attach to elements (i.e. will store in .attrs) just the name of coordinate system the data lives in, and will store all the coordinate transformations (now defined between coordinate systems and not between an element and a coordinate system), in the SpatialData object itself.

This will be more intuitive because one can interpret the coordinates of an elements (e.g. xy of points) directly as physical units, while in the current approach one has always a transformation present to tell how to transform the elements coordinates to the target space (by default this transformation is an Identity() and the target space is called "global").

Also this will be more powerful because the new coordinate system refactoring will allow to nest transformations between various coordinate systems, and share the same transformation for multiple elements. Now instead one has always to map each element, one by one, directly to the target coordinate system(s).

Currently to iterate over FOVs and their associated labels, I ripped the accessor code from:

scverse/spatialdata-plot@main/src/spatialdata_plot/pp/basic.py scverse/spatialdata-plot@main/src/spatialdata_plot/_accessor.py and wrote some custom iterator accessors.

This sounds very nice! If you would like, please consider making a contribution to the project, happy to review it in a PR.

Is it possible to create coordinate systems without a table, or should they be created after all the data is acquired. Both approaches are possible; you can call set_transformation after calling Image2DModel.parse(), or even directly in the parse() call (there is an argument for coordinate transformations.

Downstream processing will still be possible as the various APIs (transform, queries, aggregation, rasterization) have an argument to_coordinate_system that allows you to make them operate in any coordinate system (with a few exception: polygon_query() has a bug and doesn't support coordinate systems that are not identity, and also raster aggregations don't support non-identity transformations yet. But in your case since the coordinate systems would be to group samples, and therefore the transformations would be identities, it will work.

This returns a SpatialData object with AnnData.X and then I can concatenate all the AnnData tables. I recommend concatenating the SpatialData objects and not the AnnData objects otherwise region, region_key and instance_key would have to be adjusted manually. This is another of the points that will be made easier after the new table design lands.

And I create the coordinate system after this step correct? It is also possible to create it last, please mind that if you create it after saving to Zarr you need to manually re-save it to disk. The 00_xenium_visium notebook shows how to do that (it's at the end of the notebook). Also, Giovanni has a PR open that, once finished, will make updating the disk storage more ergonomic.

As a final note, I will be attending the Spatial Congress next week and happy to meet in-person, also in the days after the conference.