Overlay custom segmentation using cell segmentation resolver

EST09 commented 2 months ago

Hi,

Thank you for SOPA, it looks really helpful. I'd love to use the cell segmentation resolver. However, I'm struggling a little with how best to implement.

I have custom segmentation for a particular cell type in H&E. I've added this as a shape to my spatialdata object in the same coordinate system as cell boundaries (the default Xenium segmentation). I'd love to overlay this custom segmentation onto the default Xenium cell segmentation (cell boundaries) so that any cells/transcripts in the default under the custom segmentation get merged into the custom segmentation but the other cells from the default are kept.

Is there a way to do this in SOPA at all please?

Best wishes, Emily

quentinblampey commented 2 months ago

Hello @EST09,

Thanks for your interest in Sopa. I have a few questions to ensure I understand correctly what you want to do:

The overall result should be one segmentation composed of your custom segmentation + the default segmentation for the other cells, right? I mean, you don't want to show both segmentation in the explorer separately, but, instead, it should be merged into one segmentation?
Have you already aligned your H&E data to the DAPI image? Else, you'll need to use the Xenium Explorer to align both images, because they are not in the same coordinate system by default.
For every cell in the resulting "resolved segmentation", I guess you want to know if it comes from the custom segmentation or from the default segmentation, right?
For the custom cells, do you want to aggregate the transcripts/channels inside their boundaries to have a cell-by-gene table?

In any case, this will be possible with Sopa, I'm trying to answer your question as accurately as possible!

EST09 commented 2 months ago

Hi,

Thank you so much for getting back. That's really great to hear!

Yes, I was hoping it would be one segmentation. I have both objects as shapes in the spatialdata object so can show them separately atm anyway.
Yes, all already aligned so I have a geodataframe containing the original segmentation and a smaller geodataframe containing the custom segmentation, both in the dapi "global" coordinate space. The geometry column for each is a list of polygons corresponding to each cell.
That would be nice but not crucial as the custom segmentation is for a particular cell type that is easily identifiable in H&E anyway. The Xenium segmentation just fails on them so I want to replace it for these cells.
Yes please, this would be amazing. Xenium fails on segmenting these cells so the reason for using the custom segmentation derived from H&E was to be able to use them in anndata etc going forward.

I made a small schematic to help,

It would be really great if the functionalities already in SOPA, thank you so much!

Best wishes, Emily

quentinblampey commented 2 months ago

Thanks for providing these details! I confirm that this can be done with Sopa, but it requires combining a few different functions from the API

I'll add a new function to make it easy to use (actually, I also need this for some of my projects, so I needed to do implement it anyway). I'll let you know, but this shouldn't take too much time, and it will be released in sopa==1.0.11

EST09 commented 2 months ago

Thank you so much, that would be brilliant if possible!

EST09 commented 2 months ago

Hi,

I managed to do a quick workaround for now using (leaves small artefact cells - was going to try and merge with cell with closest nuclei centroid later)

overlaid = sdata["cell_boundaries"].overlay(sdata["custom_segmentation"], how='difference')
frames = [overlaid, sdata["custom_segmentation"]
result = pd.concat(frames)
new_cell_boundaries_gdf = gpd.GeoDataFrame(result, columns=["geometry"])
sdata.shapes["new_cell_boundaries"] = ShapesModel.parse(new_cell_boundaries_gdf)

and then used the api reference you gave to update the table (thank you!)

aggregator = sopa.segmentation.Aggregator(sdata, image_key="morphology_mip", shapes_key="new_cell_boundaries")
aggregator.update_table(gene_column="feature_name", average_intensities=False)

I'd like to export this back into Xenium explorer so everyone in the group can assess how well the segmentation is doing easily. I tried

sopa.io.explorer.write("file_path", sdata=sdata, shapes_key="new_cell_boundaries", image_key="x6", gene_column="feature_name")
#x6 is my h&E image

but I get the error

[INFO] (sopa.io.explorer.table) Writing table with 541 columns
[INFO] (sopa.io.explorer.table) Writing 2 cell categories: region, slide
[INFO] (sopa.io.explorer.shapes) Writing 4493 cell polygons

.......

AttributeError: 'MultiPolygon' object has no attribute 'exterior'

My sdata["transcripts"] looks like this

Please may I just check if it's expecting something else? Or if there's something else I'm doing that's causing the error that you can see?

Thank you very much for all your help

Best wishes, Emily

quentinblampey commented 2 months ago

Hello @EST09, I think it's because at least one of your shapes is a MultiPolygon, while I expect to have only Polygon. This probably happened during your overlay, which can indeed create artefacts.

Meanwhile, I implemented the feature, and you can test it on master (it will soon be released, but I want to perform more checks).

If you want to try it, here is some code to help you:

import sopa.io
from sopa.segmentation import overlay_segmentation

gene_column = "feature_name"

overlay_segmentation(sdata, shapes_key="custom_segmentation", gene_column=gene_column)

NB: all cells with a high overlap with the custom segmentation are removed in favour of the new segmentation (use area_ratio_threshold in the overlay_segmentation function to control this behaviour). Also, I only count transcripts for the new cells, which should be much faster than re-computing all transcripts again.

You should then have a new shapes_key inside sdata, and you can run the function below to update the Xenium Explorer. Note that I provide mode="+cob", in order to only update the transcript counts (c), the cells obs (o), and the boundaries (b): it will be much faster to compute.

shapes_key = ... # something like "cell_boundaries+custom_segmentation"

sopa.io.write(
    "file_path", sdata=sdata, shapes_key=shapes_key, gene_column=gene_column, mode="+cob"
)

EST09 commented 2 months ago

Thank you so much! That's fantastic, esp. the updating transcripts in only new cells! I'll try it out tomorrow. You were totally right about Multipolygons, ended up using geopandas explode to solve it, this seems a lot quicker though). Thanks!

quentinblampey commented 2 months ago

Hello @EST09, just wanted to let you know that I released sopa==1.0.11, which contains this feature. You can read the documentation of this function here. Concerning the Explorer update, you can refer to this function. Hope it solves your issue :)

EST09 commented 1 month ago

Sorry for the delay in getting back, thank you a lot for all your help with this!

quentinblampey commented 1 month ago

Hello @EST09, is everything working as expected? Should I close the issue?

EST09 commented 1 month ago

Hi,

I actually ended up using a slightly different segmentation in the end, to fill in the gaps around the custom cells, but used your aggregator to compile the table.

I did try your segmentation at first and all worked as expected, thank you, although I end up with a slightly different table of counts. I don't know if it's a quirk of my Xenium segmentation but in order to get the aggregator to work (for either your or my changed segmentation) I needed to make the cells valid (using .make_valid()). This transformed some cells into multipolygons and then I chose largest polygon in the multipolygon going forward to represent that cell. After using your aggregator, and checking only the unchanged cells I end up with slightly different counts per cell. I'm putting this down to choosing the largest polygon but I do need to investigate further/not sure if you've come across this issue?

I also had a quick question about which transcripts are used in the aggregation. I've set gene_name as feature_name and this gives me 541 (as it includes the control probes) rather than the 377 genes. I ended up just subsetting the anndata back down to the 377 genes but was wondering if you knew if there was a column that did this directly?

#https://gustaveroussy.github.io/sopa/tutorials/api_usage/
aggregator = sopa.segmentation.Aggregator(sdata, image_key="morphology_focus", shapes_key="new_cell_boundaries", overwrite=False)
#need to stop it changing the indices
aggregator.compute_table(gene_column="feature_name", average_intensities=False)

Thank you! I think it's basically closable - I think the issues are minor or a quirk of my data I need to investigate but would be interested if you've come across these before - thank you!

Best wishes, Emily

quentinblampey commented 1 month ago

Hi,

Indeed it's weird to have a different transcript count. Just to make sure I understand what you did: you tried the two aproaches below, right?

Add the shapes (as polygons) to the spatialdata object and run the Aggregator on these cells
Add the shapes (as polygons) to the spatialdata object and use the overlay_segmentation function

The overlay_segmentation function is also using the aggregator internally so it's unexpected to have different results.

Concerning your last question, no I think there is no way to prevent this for now. So, subsetting the anndata object is needed, even though it may be inconvenient, I agree...

quentinblampey commented 2 weeks ago

I'm closing this issue, but feel free to re-open if you need extra information!

gustaveroussy / sopa

Overlay custom segmentation using cell segmentation resolver #55