scverse / spatialdata

An open and interoperable data framework for spatial omics data
https://spatialdata.scverse.org/
BSD 3-Clause "New" or "Revised" License
226 stars 42 forks source link

Workaround to use aggregate on elements of different shapes? #618

Closed tsvvas closed 2 months ago

tsvvas commented 3 months ago

Hello spatialdata team,

Thank you for this wonderful package for spatial omics analysis.

I am currently using spatialdata to align immunohistochemistry image with xenium using alignment matrix from xenium explorer. I followed the tutorial and created a spatialdata object without problems. Now I want to calculate the intensity of IHC image using the nucleus boundaries, however the IHC image is only ~1/3 of the xenium slide so when I call aggregate, it throws an understandable error: ValueError: input arrays must have equal shapes. I tried cropping the image, but it didn't help.

Is there a workaround I can use to extract features from the IHC image?

I am trying to do that manually.

alignment_matrix_path = "../data/images/matrix.csv"
ihc_path = "../data/images/IHC.ome.tif"
image = spio.xenium_aligned_image(ihc_path, alignment_matrix_path)

# I transformed the image
tf_image = spatialdata.transform(image, to_coordinate_system="global") 

# Filtered polygons
image_height, image_width = tf_image.shape[1:]
image_bbox = box(0, 0, image_height, image_width)
polygons = spatialdata.transform(sdata.shapes["nucleus_boundaries"], to_coordinate_system="global")
filtered_polygons = polygons[polygons.intersects(image_bbox)]

# Selected channel (later, when the workaround is found I will iterate through them)
image_multi_channel = tf_image.values
image_single_channel = image_multi_channel[0] 

# Created a mask
label_image = np.zeros((image_height, image_width), dtype=int)

for i, poly in enumerate(filtered_polygons.geometry, start=1):
    coords = np.array(poly.exterior.coords).T
    rr, cc = draw.polygon(coords[1], coords[0], label_image.shape)
    label_image[rr, cc] = i

# Calculate properties using regionprops_table
props_table = pd.DataFrame(
    regionprops_table(
        label_image,
        intensity_image=image_single_channel,
        properties=['label', 'intensity_mean', 'centroid'],
    )
)

This approach kind of works, but I noticed that the transformed image is still not exactly aligned, and contains a transformation, albeit translation and not affine as it was at the start.

ax = filtered_polygons.plot(alpha=0.5, edgecolor="k")
ax.imshow(np.moveaxis(image_multi_channel, 0, -1));

image

> spatialdata.transformations.get_transformation(tf_image)
Translation (c, y, x)
    [    0.         -1200.22207853  8769.91854689]

If there is a better approach, how can I extract image features from immunohistochemistry slide? If this is the way, how can I get the fully aligned image as numpy.ndarray?

Thank you in advance, Vasily

LucaMarconato commented 3 months ago

Hi, happy to hear that you find our packages useful for aligning images.

I would use the function rasterize(), to rasterize the polygons into your desired target resolution and have it match the IHC image (you can use the target_width argument to achieve this).

Also you can call get_extent() on your IHC image and pass use the returned bounding box values to populate the argument min_coordinate and max_coordinate of rasterize() function.

After this, you will have your polygon into a labels object with the same resolution of your IHC, and you can call aggregate() (and aggregation methods from third-party libraries).

Final comment, in the case of Xenium you have already labels that are more precise than the cell boundary polygons. You can also call rasterize() on them with the same arguments that I described above to perform a pixel resampling.

Please let me know if you solves your problem.

tsvvas commented 3 months ago

Thank you for a quick reply! I tried to rasterize, but it throws NotImplementedError:

im_coords = spatialdata.get_extent(sdata, coordinate_system="global", elements=[image_name])
spatialdata.rasterize(
    sdata.shapes["nucleus_boundaries"],
    ["x", "y"],
    min_coordinate=[im_coords["x"][0], im_coords["y"][0]],
    max_coordinate=[im_coords["x"][1], im_coords["y"][1]],
    target_coordinate_system="global",
    target_unit_to_pixels=1.0,
)

I just noticed that there is a new version of spatialdata published yesterday, and rasterize doesn't raise this error in the new version. However, when I tried to update the package, it threw some import errors. I will try to solve the installation errors and then run the rasterize function in spatialdata-0.2.1

LucaMarconato commented 3 months ago

Exactly, as you noticed the rasterize function as been released only recently (with 0.2.0). Which import errors are you getting?

tsvvas commented 2 months ago

My current setup is a conda env file for environment with requirements.txt with spatialdata[extra] in it. When I update spatialdata with pip with this setup it behaves randomly sometimes. I just solve it with force reinstall for now.

Now the rasterize function works. The previous setup generated a SpatialImage object I think, so I used the return_regions_as_labels=True option and got an xarray.DataArray 'image' (y: 37316 x: 29898) object where both x and y are float64. Which should be fine as other labels have the same data type. However, aggregate fails with an error:

TypeError: Only int, np.int16, np.int32, np.int64 or string allowed as dtype for instance_key column in obs. Dtype found to be float64
tsvvas commented 2 months ago

I figured out that the problem is not in x or y axis, but in the main xarray data type. This correction works:

nucleus_labels = spatialdata.rasterize(
    sdata.shapes["nucleus_boundaries"],
    ["x", "y"],
    min_coordinate=[im_coords["x"][0], im_coords["y"][0]],
    max_coordinate=[im_coords["x"][1], im_coords["y"][1]],
    target_coordinate_system="global",
    target_unit_to_pixels=1.0,
    return_regions_as_labels=True
)
sdata.labels["nuclei_subset"] = nucleus_labels.astype(np.int64)
sdata_im = sdata.aggregate(values=image_name, by="nuclei_subset", agg_func="mean")

If this is how it should be, then the issue can be closed

LucaMarconato commented 2 months ago

Perfect, thanks for the information provided and happy that the issue is solved.

LouisK92 commented 1 month ago

Thanks to both of you for the issue and the solution, helped a lot!

Small follow up issue that I ran into + a solution: In case you crop the sdata beforehand (to reduce the size of the rasterized image), make sure that your crop is based on integer coordinates, otherwise you might end up with correct shapes but still get the same Error (ValueError: input arrays must have equal shapes), which was confusing for me.

import numpy as np
import spatialdata as sd

# crop sdata based on extent of shapes of interest
shape_key="..."

data_extent_ = sd.get_extent(sdata[shape_key], coordinate_system="global")
data_extent = {}
for key, values in data_extent_.items():
    data_extent[key] = np.array([np.floor(values[0]), np.ceil(values[1])])

sdata_crop = sd.bounding_box_query(
    sdata,
    min_coordinate=[data_extent["x"][0], data_extent["y"][0]],
    max_coordinate=[data_extent["x"][1], data_extent["y"][1]],
    axes=("x", "y"),
    target_coordinate_system="global",
)

then continue with sdata_crop instead of sdata as above.