Currently we explain how to work with tables with a notebook that has been reported being too technical.
We are considering moving the notebook to a technical section of the docs and instead make a new notebook showing a biological use case.
Here is a possible story from the notebook.
loading the data
load data containing a segmentation (with a spatialdata-io reader)
load some extra annotation from CSV files with increasing complexity, like having or not an header, missing row, multiple samples, etc (idea from @minhtien-trinh), showing how we can go from a CSV file to a AnnData table that is annotating an element
resegmenting
resegment it with a simple algorithm not requiring heavy dependencies, but mention state of the art/recommended methods
now show how to add this new segmentation to the SpatialData object
comparing the segmentations by spatial overlap
say that we had 2 segmentation masks, and each masks had a gene expression table
show how to create 2 new tables (one for each segmentation mask), so that the original tables are filtered and reindexed to contain only the cells for which the 2 segmentation masks agree (spatial overlap)
On my own dataset I would suggest following setup:
Reading in data and annotating shapes
load in dataset (without table, but segmented). I do have it as a zarr though now, so not with a reader
label the veins in the region, so add a shapes layer to the object.
annotate the shapes layer: with a csv?
Addition of table to sdata object
Do I explain here the region/regionsprops etc or do I skip this and do we make an easy function to actually perform these steps?
We perform an aggregate on the transcripts, so we have features per cell.
We import the annotation, and add it to the anndata object (this is actually just anndata stuff).
We calculate the distance for every cell to the closest annotated object, save the distance and save the name of the shape per cell in the table?
We can do this with less well-annotated csv's, but is this not more an anndata tutorial?
Multiple tables
Here I have two relevant options:
I can create a 'bad' segmentation mask, import it, and create a second table based on the shapes layer of this (also then explain how to transfer a labels layer to a shapes layer with rasterize).
In this case, you could look at the cells that are similar in the two datasets, or you could (IMO more interesting) look at the cells that are only occurring in one of the two datasets.
We could compare the cells using the spatial join functions (based on spatialData I assume).
I can create different expansions of the cells, and then create shapes layers and tables for all expansion ratio's.
In this case, the cell labels for all cells will be the same over the different tables, which of course is not standard, but it is an easy example on how you can use the multiple tables to perform comparisons.
I can show this on the labels layer and on the shapes layer (where shapes can overlap, and layers cannot).
You could also transfer the annotation layer from one table to another if you'd want to.
You can look how the gene expression of certain genes per celltype changes over the different tables.
Currently we explain how to work with tables with a notebook that has been reported being too technical.
We are considering moving the notebook to a technical section of the docs and instead make a new notebook showing a biological use case.
Here is a possible story from the notebook.
loading the data
spatialdata-io
reader)AnnData
table that is annotating an elementresegmenting
SpatialData
objectcomparing the segmentations by spatial overlap
say that we had 2 segmentation masks, and each masks had a gene expression table