scverse / spatialdata-io

BSD 3-Clause "New" or "Revised" License
32 stars 19 forks source link

bounding_box_query does not work on cellpose result stored in spatialdata #166

Open mamyAndrianteranagna opened 1 week ago

mamyAndrianteranagna commented 1 week ago

Dear scverse Team,

First of all, thank you for your effort on developping integrative frameworks for spatial transcriptomics data.

Now, I'm analyzing MERFISH data using spatialdata. When I use cellpose to segment cells, I am not able to crop my data using the bounding_box_query function. However, this function works fine for data from generated by other segmentation tools such as baysor.

We suspect an incompatibility in the coordinate system but do not know how to check. Could you please help us to solve the issue?

The code to crop the data:

from spatialdata import bounding_box_query crop_sdata = bounding_box_query(sdata, min_coordinate=[8000,8000], max_coordinate=[8250,8250], axes=("x","y"), target_coordinate_system="microns",filter_table=True)

This is the error message:


LinAlgError Traceback (most recent call last) Input In [7], in <cell line: 2>() 1 from spatialdata import bounding_box_query ----> 2 crop_sdata = bounding_box_query(sdata, min_coordinate=[8000,8000], max_coordinate=[8250,8250], axes=("x","y"), target_coordinate_system="microns",filter_table=True)

File /data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_sopa/lib/python3.10/functools.py:889, in singledispatch..wrapper(*args, *kw) 885 if not args: 886 raise TypeError(f'{funcname} requires at least ' 887 '1 positional argument') --> 889 return dispatch(args[0].class)(args, **kw)

File /data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_sopa/lib/python3.10/site-packages/spatialdata/_core/query/spatialquery.py:454, in (sdata, axes, min_coordinate, max_coordinate, target_coordinate_system, filter_table) 452 for element_type in ["points", "images", "labels", "shapes"]: 453 elements = getattr(sdata, element_type) --> 454 queried_elements = _dict_query_dispatcher( 455 elements, 456 bounding_box_query, 457 axes=axes, 458 min_coordinate=min_coordinate, 459 max_coordinate=max_coordinate, 460 target_coordinate_system=target_coordinate_system, 461 ) 462 new_elements[element_type] = queried_elements 464 tables = _get_filtered_or_unfiltered_tables(filter_table, new_elements, sdata)

File /data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_sopa/lib/python3.10/site-packages/spatialdata/_core/query/spatial_query.py:398, in _dict_query_dispatcher(elements, query_function, kwargs) 396 assert isinstance(d, dict) 397 if target_coordinate_system in d: --> 398 result = query_function(element, kwargs) 399 if result is not None: 400 # query returns None if it is empty 401 queried_elements[key] = result

File /data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_sopa/lib/python3.10/functools.py:889, in singledispatch..wrapper(*args, *kw) 885 if not args: 886 raise TypeError(f'{funcname} requires at least ' 887 '1 positional argument') --> 889 return dispatch(args[0].class)(args, **kw)

File /data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_sopa/lib/python3.10/site-packages/spatialdata/_core/query/spatialquery.py:728, in (polygons, axes, min_coordinate, max_coordinate, target_coordinatesystem) 720 = BoundingBoxRequest( 721 target_coordinate_system=target_coordinate_system, 722 axes=axes, 723 min_coordinate=min_coordinate, 724 max_coordinate=max_coordinate, 725 ) 727 # get the four corners of the bounding box --> 728 (intrinsic_bounding_box_corners, intrinsic_axes) = _get_bounding_box_corners_in_intrinsic_coordinates( 729 element=polygons, 730 axes=axes, 731 min_coordinate=min_coordinate, 732 max_coordinate=max_coordinate, 733 target_coordinate_system=target_coordinate_system, 734 ) 736 bounding_box_non_axes_aligned = Polygon(intrinsic_bounding_box_corners) 737 indices = polygons.geometry.intersects(bounding_box_non_axes_aligned)

File /data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_sopa/lib/python3.10/site-packages/spatialdata/_core/query/spatial_query.py:102, in _get_bounding_box_corners_in_intrinsic_coordinates(element, axes, min_coordinate, max_coordinate, target_coordinate_system) 100 # transform the coordinates to the intrinsic coordinate system 101 intrinsic_axes = get_axes_names(element) --> 102 transform_to_intrinsic = transform_to_query_space.inverse().to_affine_matrix( # type: ignore[union-attr] 103 input_axes=axes, output_axes=intrinsic_axes 104 ) 105 rotation_matrix = transform_to_intrinsic[0:-1, 0:-1] 106 translation = transform_to_intrinsic[0:-1, -1]

File /data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_sopa/lib/python3.10/site-packages/spatialdata/transformations/transformations.py:527, in Affine.inverse(self) 526 def inverse(self) -> BaseTransformation: --> 527 inv = np.linalg.inv(self.matrix) 528 return Affine(inv, self.output_axes, self.input_axes)

File /data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_sopa/lib/python3.10/site-packages/numpy/linalg/linalg.py:556, in inv(a) 554 a, wrap = _makearray(a) 555 _assert_stacked_2d(a) --> 556 _assert_stacked_square(a) 557 t, result_t = _commonType(a) 559 signature = 'D->D' if isComplexType(t) else 'd->d'

File /data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_sopa/lib/python3.10/site-packages/numpy/linalg/linalg.py:213, in _assert_stacked_square(*arrays) 211 m, n = a.shape[-2:] 212 if m != n: --> 213 raise LinAlgError('Last 2 dimensions of the array must be square')

LinAlgError: Last 2 dimensions of the array must be square

The structure of our spatialdata:

SpatialData object with: ├── Images │ └── 'output_region_0_z3': MultiscaleSpatialImage[cyx] (11, 96363, 78210), (11, 48181, 39105), (11, 24090, 19552), (11, 12045, 9776), (11, 6022, 4888) ├── Points │ └── 'output_region_0_transcripts': DataFrame with shape: (19797064, 9) (2D points) ├── Shapes │ ├── 'cellpose_boundaries': GeoDataFrame shape: (62498, 1) (2D shapes) │ ├── 'output_region_0_polygons': GeoDataFrame shape: (196467, 1) (2D shapes) │ └── 'sopa_patches': GeoDataFrame shape: (9, 1) (2D shapes) └── Tables └── 'table': AnnData (62498, 315) with coordinate systems: ▸ 'microns', with elements: output_region_0_z3 (Images), output_region_0_transcripts (Points), cellpose_boundaries (Shapes), output_region_0_polygons (Shapes), sopa_patches (Shapes)

Version of tools: spatialdata 0.1.2, cellpose 3.0.10, cellpose was run on image patches created by the sopa framework version 1.1.0 after creating multiples patches of the image)

Please let me know if you need more informations.

Looking forward for your reply.

Regards, Mamy

LucaMarconato commented 1 week ago

Hi thanks for the reporting the bug and happy that you find our package useful 😊 I haven't seen this bug before.

Could you paste here the content of .attrs in your shapes object? It could may be a problem with a transformation badly specified. Thanks!

If this is not enough to understand the root of the problem, I think here the easiest would be if you could subset the data by dropping all but the shapes that lead to the bug, and also try to subset the rows of the GeoDataFrame to reduce it to a very small dataset. After this if you could write it to Zarr, zip it and send it to me via Zulip: https://scverse.zulipchat.com/#user/480560.

But let's see if the first approach is enough 😊

mamyAndrianteranagna commented 1 week ago

Hello Luca,

Thanks for you rapid reply.

Here is the .attrs of the shape:

sdata.shapes['cellpose_boundaries'].attrs

{'transform': {'microns': Affine (x, y -> c, x, y) [0. 0. 0.] [1.07998399e-01 0.00000000e+00 4.20769397e+03] [0.00000000e+00 1.07999088e-01 1.62009116e+03] [0. 0. 1.]}}

Also, I 've tried to delete all other shapes except the cellpose one but I still had the same error:

del sdata.shapes['output_region_0_polygons'] del sdata.shapes['sopa_patches'] sdata

SpatialData object with: ├── Images │ └── 'output_region_0_z3': MultiscaleSpatialImage[cyx] (11, 96363, 78210), (11, 48181, 39105), (11, 24090, 19552), (11, 12045, 9776), (11, 6022, 4888) ├── Points │ └── 'output_region_0_transcripts': DataFrame with shape: (19797064, 9) (2D points) ├── Shapes │ └── 'cellpose_boundaries': GeoDataFrame shape: (62498, 1) (2D shapes) └── Tables └── 'table': AnnData (62498, 315) with coordinate systems: ▸ 'microns', with elements: output_region_0_z3 (Images), output_region_0_transcripts (Points), cellpose_boundaries (Shapes)

Cropping this object leads exactly to the same error.

Regards, Mamy

LucaMarconato commented 1 week ago

Thanks for sharing. I think that if you remove the c channel from the output of the transformation it will work. Was the transformation given with the c channel from cellpose?

mamyAndrianteranagna commented 1 week ago

Hello Luca,

Can you let us know how we can easily remove the c channel?

We tried to work directly on the transformation matrix of the 'cellpose_boundaries' but we didn't succeed to remove the error.

For example, we changed this

sdata.shapes['cellpose_boundaries'].attrs["transform"]

{'microns': Affine (x, y -> c, x, y) [0. 0. 0.] [1.07998399e-01 0.00000000e+00 4.20769397e+03] [0.00000000e+00 1.07999088e-01 1.62009116e+03] [0. 0. 1.]}

to this

sdata.shapes['cellpose_boundaries'].attrs["transform"]

{'microns': Affine (x, y -> c, x, y) [0. 0.] [ 0. 4207.69396883] [1.07999088e-01 1.62009116e+03] [0. 1.]}

but still have the same error.

Thanks for your help!

Best,

LucaMarconato commented 3 days ago

Solution

Here is how you can remove the c channel and set the transformation inside the object again:

affine_without_c = affine_from_data.to_affine(input_axes=("x", "y"), output_axes=("x", "y"))
set_transformation(sdata[element_name], affine_without_c)

Alternative solution

You could have also constructed a new Affine transformation from a new affine matrix (you can see the code for doing it below).

Saving to disk

Please note that calling set_transformation only changes the transformation metadata in-memory. If you want to also save this to disk you need to call set_transformation with write_to_sdata set to the SpatialData object you are considering. Alternatively you can call the method write_transformations() of the SpatialData object you are considering (please note that this method is not currently available in the version you can install from pip; we are making a new release this week).

Full example

Full example showing how to reproduce your bug and how to make it disappear (the example doesn't show how to save to disk).

##
import numpy.linalg
import pytest
from spatialdata.transformations import Affine, set_transformation, get_transformation
from spatialdata.datasets import blobs
from spatialdata import bounding_box_query

##
# data preparation
sdata = blobs()
element_name = "blobs_circles"

matrix = [
    [0.00000000e00, 0.00000000e00, 0.00000000e00],
    [1.07998399e-01, 0.00000000e00, 4.20769397e03],
    [0.00000000e00, 1.07999088e-01, 1.62009116e03],
    [0.00000000e00, 0.00000000e00, 1.00000000e00],
]

affine = Affine(matrix, input_axes=("x", "y"), output_axes=("c", "x", "y"))
set_transformation(sdata[element_name], affine)

##
# querying the data fails
with pytest.raises(numpy.linalg.LinAlgError):
    bounding_box_query(
        sdata[element_name],
        axes=("x", "y"),
        min_coordinate=[4000, 1000],
        max_coordinate=[5000, 2000],
        target_coordinate_system="global",
    )

##
# replacing the transformation with a matrix without the 'c' axis
# this is the same as the transformation above, here I am showing how to get it from the data
affine_from_data = get_transformation(sdata[element_name])

affine_without_c = affine_from_data.to_affine(input_axes=("x", "y"), output_axes=("x", "y"))
set_transformation(sdata[element_name], affine_without_c)

##
# querying the data succeeds
bounding_box_query(
    sdata[element_name],
    axes=("x", "y"),
    min_coordinate=[4000, 1000],
    max_coordinate=[5000, 2000],
    target_coordinate_system="global",
)
mamyAndrianteranagna commented 2 days ago

Thanks Luca for your reply. I will try it and let you know.

Regards, Mamy