Closed EST09 closed 2 weeks ago
Hello @EST09,
Thanks for your interest in Sopa. I have a few questions to ensure I understand correctly what you want to do:
In any case, this will be possible with Sopa, I'm trying to answer your question as accurately as possible!
Hi,
Thank you so much for getting back. That's really great to hear!
I made a small schematic to help,
It would be really great if the functionalities already in SOPA, thank you so much!
Best wishes, Emily
Thanks for providing these details! I confirm that this can be done with Sopa, but it requires combining a few different functions from the API
I'll add a new function to make it easy to use (actually, I also need this for some of my projects, so I needed to do implement it anyway).
I'll let you know, but this shouldn't take too much time, and it will be released in sopa==1.0.11
Thank you so much, that would be brilliant if possible!
Hi,
I managed to do a quick workaround for now using (leaves small artefact cells - was going to try and merge with cell with closest nuclei centroid later)
overlaid = sdata["cell_boundaries"].overlay(sdata["custom_segmentation"], how='difference')
frames = [overlaid, sdata["custom_segmentation"]
result = pd.concat(frames)
new_cell_boundaries_gdf = gpd.GeoDataFrame(result, columns=["geometry"])
sdata.shapes["new_cell_boundaries"] = ShapesModel.parse(new_cell_boundaries_gdf)
and then used the api reference you gave to update the table (thank you!)
aggregator = sopa.segmentation.Aggregator(sdata, image_key="morphology_mip", shapes_key="new_cell_boundaries")
aggregator.update_table(gene_column="feature_name", average_intensities=False)
I'd like to export this back into Xenium explorer so everyone in the group can assess how well the segmentation is doing easily. I tried
sopa.io.explorer.write("file_path", sdata=sdata, shapes_key="new_cell_boundaries", image_key="x6", gene_column="feature_name")
#x6 is my h&E image
but I get the error
[INFO] (sopa.io.explorer.table) Writing table with 541 columns
[INFO] (sopa.io.explorer.table) Writing 2 cell categories: region, slide
[INFO] (sopa.io.explorer.shapes) Writing 4493 cell polygons
.......
AttributeError: 'MultiPolygon' object has no attribute 'exterior'
My sdata["transcripts"]
looks like this
Please may I just check if it's expecting something else? Or if there's something else I'm doing that's causing the error that you can see?
Thank you very much for all your help
Best wishes, Emily
Hello @EST09, I think it's because at least one of your shapes is a MultiPolygon
, while I expect to have only Polygon
. This probably happened during your overlay, which can indeed create artefacts.
Meanwhile, I implemented the feature, and you can test it on master
(it will soon be released, but I want to perform more checks).
If you want to try it, here is some code to help you:
import sopa.io
from sopa.segmentation import overlay_segmentation
gene_column = "feature_name"
overlay_segmentation(sdata, shapes_key="custom_segmentation", gene_column=gene_column)
NB: all cells with a high overlap with the custom segmentation are removed in favour of the new segmentation (use
area_ratio_threshold
in theoverlay_segmentation
function to control this behaviour). Also, I only count transcripts for the new cells, which should be much faster than re-computing all transcripts again.
You should then have a new shapes_key
inside sdata
, and you can run the function below to update the Xenium Explorer. Note that I provide mode="+cob"
, in order to only update the transcript counts (c
), the cells obs (o
), and the boundaries (b
): it will be much faster to compute.
shapes_key = ... # something like "cell_boundaries+custom_segmentation"
sopa.io.write(
"file_path", sdata=sdata, shapes_key=shapes_key, gene_column=gene_column, mode="+cob"
)
Thank you so much! That's fantastic, esp. the updating transcripts in only new cells! I'll try it out tomorrow. You were totally right about Multipolygons, ended up using geopandas explode to solve it, this seems a lot quicker though). Thanks!
Hello @EST09, just wanted to let you know that I released sopa==1.0.11
, which contains this feature. You can read the documentation of this function here.
Concerning the Explorer update, you can refer to this function.
Hope it solves your issue :)
Sorry for the delay in getting back, thank you a lot for all your help with this!
Hello @EST09, is everything working as expected? Should I close the issue?
Hi,
I actually ended up using a slightly different segmentation in the end, to fill in the gaps around the custom cells, but used your aggregator to compile the table.
I did try your segmentation at first and all worked as expected, thank you, although I end up with a slightly different table of counts. I don't know if it's a quirk of my Xenium segmentation but in order to get the aggregator to work (for either your or my changed segmentation) I needed to make the cells valid (using .make_valid()). This transformed some cells into multipolygons and then I chose largest polygon in the multipolygon going forward to represent that cell. After using your aggregator, and checking only the unchanged cells I end up with slightly different counts per cell. I'm putting this down to choosing the largest polygon but I do need to investigate further/not sure if you've come across this issue?
I also had a quick question about which transcripts are used in the aggregation. I've set gene_name as feature_name and this gives me 541 (as it includes the control probes) rather than the 377 genes. I ended up just subsetting the anndata back down to the 377 genes but was wondering if you knew if there was a column that did this directly?
#https://gustaveroussy.github.io/sopa/tutorials/api_usage/
aggregator = sopa.segmentation.Aggregator(sdata, image_key="morphology_focus", shapes_key="new_cell_boundaries", overwrite=False)
#need to stop it changing the indices
aggregator.compute_table(gene_column="feature_name", average_intensities=False)
Thank you! I think it's basically closable - I think the issues are minor or a quirk of my data I need to investigate but would be interested if you've come across these before - thank you!
Best wishes, Emily
Hi,
Indeed it's weird to have a different transcript count. Just to make sure I understand what you did: you tried the two aproaches below, right?
Aggregator
on these cellsoverlay_segmentation
functionThe overlay_segmentation
function is also using the aggregator internally so it's unexpected to have different results.
Concerning your last question, no I think there is no way to prevent this for now. So, subsetting the anndata object is needed, even though it may be inconvenient, I agree...
I'm closing this issue, but feel free to re-open if you need extra information!
Hi,
Thank you for SOPA, it looks really helpful. I'd love to use the cell segmentation resolver. However, I'm struggling a little with how best to implement.
I have custom segmentation for a particular cell type in H&E. I've added this as a shape to my spatialdata object in the same coordinate system as cell boundaries (the default Xenium segmentation). I'd love to overlay this custom segmentation onto the default Xenium cell segmentation (cell boundaries) so that any cells/transcripts in the default under the custom segmentation get merged into the custom segmentation but the other cells from the default are kept.
Is there a way to do this in SOPA at all please?
Best wishes, Emily