gustaveroussy / sopa

Technology-invariant pipeline for spatial omics analysis (Xenium / MERSCOPE / CosMx / PhenoCycler / MACSima / Hyperion) that scales to millions of cells
https://gustaveroussy.github.io/sopa/
BSD 3-Clause "New" or "Revised" License
94 stars 8 forks source link

Classifying positive cells #83

Closed dylanmr closed 2 weeks ago

dylanmr commented 2 weeks ago

Hi - Thanks for the great tool! I am currently working with a xenium dataset where i have a single post-stain for a marker of interest. I would like to try and classify cells as "positive" or "negative" for my stain.

I have ran through the sopa protocol and have been able to align my xenium and post-stain images but am a bit stuck on the classification step. I normally use a machine learning object classifier approach in qupath - but I am getting stuck at exporting the aligned tifs from sopa/spatialdata/xenium explorer to use in qupath.

Do you have any suggestions on how to go about this in sopa/spatialdata?

Best, Dylan

quentinblampey commented 2 weeks ago

Hi @dylanmr,

All this can be done inside sopa, I think there is no need to use qupath.

First, run this average_channels function to get the mean staining intensity per cell. Since you have only a single marker, you can manually choose a threshold for positive vs negative, or use this preprocess_fluo function to compute Z-scores.

Here is some pseudo-code:

from sopa.segmentation.aggregate import average_channels
from sopa.annotation import preprocess_fluo
import spatialdata

sdata = spatialdata.read_zarr("...") # read your sdata object
adata = sdata["table"]

# add the right image_key and shapes_key below
X = average_channels(sdata, image_key="your_single_stain_image", shapes_key="your_segmentation_boudaries")

# you can save the intensities there (if you don't already have some)
adata.obsm["intensities"] = X

z_scores = preprocess_fluo(adata)
is_positive = z_scores > 0

Do you use the Xenium Explorer? If yes, you can read this tutorial to udpdate it with your new cell classes.

Hope this helps!

dylanmr commented 2 weeks ago

Hi Quentin,

Thanks for the help and pseudocode! This runs quite well now. Unfortunately calling the cells as positive/negative based on the z-score is a bit noisy for my image (I can probably play with the threshold to get close tho). I have attached a screenshot here below without any gamma correction and minimal thresholding.

image

In qupath, with some image processing the positive cells become quite clear and i can train quite an accurate model in qupath using the mean intensity, max intensity and standard deviation. Do you have any plans to include these calculations in the aggregation functions?

I am thinking I could create a small training set from my manual classification and then I could just use these metrics and train a simple random forest/xgboost model on this matrix.

Best Dylan

quentinblampey commented 2 weeks ago

Thanks for the additional information, this is really relevant! I'll try to add the max intensity and the standard deviation in the aggregate function. I'll try to work on that in the next few weeks, I'll let you know :)

dylanmr commented 2 weeks ago

That would be great! Thank you! Just one other potential suggestion would be to take into account the area of the signal - and potentially include the ability to threshold the image before averaging.

I can threshold the image with this code - which seems to help significantly

` data = sdata.images["21737_7_adjusted"] channel_to_threshold = "AF594" min_threshold_value = 2610 max_threshold_value = 3675

Apply the thresholds to the selected channel

thresholded_channel = data.sel(c=channel_to_threshold) thresholded_channel = thresholded_channel.where(thresholded_channel >= min_threshold_value, min_threshold_value) thresholded_channel = thresholded_channel.where(thresholded_channel <= max_threshold_value, max_threshold_value)

Scale the thresholded values between 0 and 1

min_value = thresholded_channel.min() max_value = thresholded_channel.max() scaled_channel = (thresholded_channel - min_value) / (max_value - min_value) sdata.images["21737_7_adjusted"].loc[{'c': channel_to_threshold}] = scaled_channel `

However I end up with cells that look like this called as positive:

image image image

where as what im looking for should look like this:

image image image

Best, Dylan

quentinblampey commented 2 weeks ago

Indeed, such data processing can be very useful, but I prefer to let the user decide how to do that, and then apply the channel averaging from sopa. I think I will also return some percentile of intensity per cell, like the percentile 80 or percentile 90. It can be useful when you have only a few pixels that are bright

dylanmr commented 2 weeks ago

Great! Thanks for all your help and the discussion - looking forward to continue working with sopa!