rendeirolab / LazySlide

Modularized and scalable whole slide image analysis
https://lazyslide.rtfd.io/
14 stars 0 forks source link

Design of WSI data structure #42

Closed Mr-Milk closed 1 month ago

Mr-Milk commented 1 month ago

The current design of the data structure to represent WSI contains several drawbacks. This issue is to discuss if there are better solutions or if the current one is already acceptable.

Another thing to discuss here is if we should ship the WSIData as a separated package, think of anndata and scanpy.

Design

We use a WSIData class that abstracts two things

  1. Read WSI
  2. Read/Write for SpatialData

Image

Pros

  1. Easy implementation
  2. Performant IO if interacts with slide image
  3. Avoid saving a copy of the slide

Cons

  1. User needs to keep a paired slide file and SpatialData file
  2. Cannot fully integrate with the SpatialData ecosystem

Tissue contours

Attach in shapes slot, the tissue holes should be passed to holes parameters when constructing Polygon.

Tiles

Attach in shapes slot

Tiles are saved as polygon, the tiles table should also record the x and y in image coordinates at level 0.

A tile_spec should be recorded.

Feature Matrix

This is always a 2D array, attach in images slot

Tile neighbors sparse matrix

Save in the obsp slot of an AnnData object in the tables slot of SpatialData

afrendeiro commented 1 month ago

Thanks for writing this up.

A few initial thoughts:

Mr-Milk commented 1 month ago

Thanks for sharing your thoughts, I agree with most of the points.

Mr-Milk commented 1 month ago

However, if we think feature matrix for WSI as gene expression in spatial omics. It actually make more sense to save them in tables.