quentinblampey / spatialdata_xenium_explorer

Interoperability between SpatialData and the Xenium Explorer
https://quentinblampey.github.io/spatialdata_xenium_explorer/
BSD 3-Clause "New" or "Revised" License
15 stars 0 forks source link

spatialdata_xenium_explorer.write #5

Closed Chuang1118 closed 6 months ago

Chuang1118 commented 6 months ago

Hi SOPA team,

Thanks for this excellent work!

Would you help me to convert zarr format into xenium-like format using spatialdata_xenium_explorer.write function?

I have Nanostring Cosmx dataset in format spatialdata like SpatialData tutorial

  1. It is possible to visualize cosmx data using the Xenium Explorer?
  2. If 1 is possible, cosmx data has the attribute called labels in stead of shapes, can you handle it?
  3. How about the arguments "gene_column" in spatialdata_xenium_explorer.write function in my case?

Thank you for your help, Chuang

quentinblampey commented 6 months ago

Hello @Chuang1118, thanks for your interest in spatialdata_xenium_explorer and sopa!

  1. We still have some difficulties with CosMX data because the image needs to be stitched (i.e., creating one image based on all the FOVs), see this issue. Actually, we should be able to convert each single FOV separately, but this is not very helpful, so I have to work on this stitching to be able to really support CosMX data.

  2. For that, you can refer to this function of Sopa that will geometrize the cells for you, for instance:

    
    from sopa.segmentation import shapes
    from shapely import affinity

mask is a numpy array of cell-ids per pixel, similar to your labels (see API documentation)

cells = shapes.geometrize(mask)

xmin and ymin are the min coordinate of the bounding box of your FOV

cells = [affinity.translate(cell, xmin, ymin) for cell in cells]

Now the cells are polygons in the global pixel coordinate system


This is not very easy to use for now, but everything will be simplified when we will be able to stitch CosMX data: then, you'll be able to directly use the full Sopa toolbox.

3. gene_column is the name of the column that contains the genes names in each of the dataframes (for instance, you can look at `sdata.points['1_points']` to see which column contains the genes names).

I'll let you know when I make progress about CosMX data. Meanwhile, I hope this helps!
Chuang1118 commented 6 months ago

Thanks a lot for the quick response!

  1. I deeply appreciate your effort to stitch multiple FOV. Currently, I work with CosMx TMA(Tissue Micro-array) data, I don't have this problem.
  2. In my opinion, this point is only link to handle stitching issue. In my case, I don't have to have xmin and ymin to define the cells within a given FOV.

My point is there are different data type(Labels & Shapes) through CosMx and Xemium/Merfish technologies, even though both them are SpatialData format. More details:

CosMx:

SpatialData object with:

├── Images
│     ├── '1_image': SpatialImage[cyx] (3, 3648, 5472)
│     ├── '2_image': SpatialImage[cyx] (3, 3648, 5472)
│     ├── '3_image': SpatialImage[cyx] (3, 3648, 5472)

├── Labels
│     ├── '1_labels': SpatialImage[yx] (3648, 5472)
│     ├── '2_labels': SpatialImage[yx] (3648, 5472)
│     ├── '3_labels': SpatialImage[yx] (3648, 5472)

└── Points
      ├── '1_points': DataFrame with shape: (3593716, 9) (2D points)
      ├── '2_points': DataFrame with shape: (2589915, 9) (2D points)
      ├── '3_points': DataFrame with shape: (2303614, 9) (2D points)

Xemium/Merfish: Do you have any idea to convert attribute Labels into Shapes?

SpatialData object with:
├── Images
│     └── 'rasterized': SpatialImage[cyx] (1, 522, 575)
├── Points
│     └── 'single_molecule': DataFrame with shape: (3714642, 3) (2D points)
├── Shapes
│     ├── 'anatomical': GeoDataFrame shape: (6, 1) (2D shapes)
│     └── 'cells': GeoDataFrame shape: (2399, 2) (2D shapes)
└── Table
      └── AnnData object with n_obs × n_vars = 2399 × 268
    obs: 'cell_id', 'region'
    uns: 'spatialdata_attrs': AnnData (2399, 268)
with coordinate systems:
▸ 'global', with elements:
        rasterized (Images), single_molecule (Points), anatomical (Shapes), cells (Shapes)
  1. There is sdata.points['10_points'] output, I don't see any columns contain the genes names.

Screenshot from 2024-03-29 09-14-31

Thanks a lot for your help, Chuang

quentinblampey commented 6 months ago

Okay great, so it means that each FOV corresponds to one sample, right? This simplifies everything! Concerning the gene_column, it is indeed "target": for instance if you do sdata["10_points"]["target"].compute(), you should get the gene names.

Note that we are using Dask, which is using lazy-loading, which is why you can't see the genes until you use .compute(). The purpose is RAM efficiency, since big slides are usually too big to fit in memory.

Here is a snippet to convert one FOV to the xenium explorer (in this example, I convert FOV=1). Note that you'll need Sopa, not only spatialdata_xenium_explorer, because it is needed to convert the labels to shapes.

import spatialdata
import sopa.io
from sopa.segmentation import shapes, StainingSegmentation
from sopa.segmentation.aggregate import Aggregator

### Read the CosMX data
sdata = spatialdata.read_zarr("cosmx_io.zarr") # the sample from the SpatialData tutorial
sdata = sdata.filter_by_coordinate_system("1") # get only the first FOV
del sdata.table # we delete the table because we will re-compute a new table below

### Transform labels to shapes
cells = shapes.geometrize(sdata["1_labels"].values)
StainingSegmentation.add_shapes(sdata, cells, image_key="1_image", shapes_key="1_shapes")

### Compute a new table
aggr = Aggregator(sdata, image_key="1_image", shapes_key="1_shapes")
aggr.update_table(gene_column="target", average_intensities=False)

### Convert to Xenium Explorer. The outputs will be saved in the "cosmx_xenium_explorer" directory
sopa.io.write("cosmx_xenium_explorer", sdata, gene_column="target")

Then, double-click on the cosmx_xenium_explorer/experiment.xenium file, it should open it in the Xenium Explorer! Let me know if everything works @Chuang1118 :)

Chuang1118 commented 6 months ago

Hello,

Yes, one TMA corresponds to one capture, there are non-continuous both the FOVs.

Thanks for your advice, I will try it, then give you feedback.

Chuang1118 commented 6 months ago

Unfortunately, your code doesn't work for me.

StainingSegmentation.add_shapes(sdata, cells, image_key="1_image", shapes_key="1_shapes")

This line crashes both my dataset and the SpatialData tutorial's dataset(cosmx_io.zarr). I have got same error.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[7], line 1
----> 1 StainingSegmentation.add_shapes(sdata, cells, image_key="1_image", shapes_key="1_shapes")

File [~/miniconda3/envs/sopa/lib/python3.10/site-packages/sopa/segmentation/stainings.py:181](http://10.72.42.72:1237/lab/tree/PMLAB/USERS/Active/Chuang_DONG/scRNAseq/cosmx_230417_h_AITL/2_Analysis/V1/SpatialData/~/miniconda3/envs/sopa/lib/python3.10/site-packages/sopa/segmentation/stainings.py#line=180), in StainingSegmentation.add_shapes(cls, sdata, cells, image_key, shapes_key)
    178 geo_df.index = image_key + geo_df.index.astype(str)
    180 geo_df = ShapesModel.parse(geo_df, transformations=get_transformation(image, get_all=True))
--> 181 sdata.add_shapes(shapes_key, geo_df, overwrite=True)
    183 log.info(f"Added {len(geo_df)} cell boundaries in sdata['{shapes_key}']")

File [~/miniconda3/envs/sopa/lib/python3.10/site-packages/spatialdata/_core/spatialdata.py:1273](http://10.72.42.72:1237/lab/tree/PMLAB/USERS/Active/Chuang_DONG/scRNAseq/cosmx_230417_h_AITL/2_Analysis/V1/SpatialData/~/miniconda3/envs/sopa/lib/python3.10/site-packages/spatialdata/_core/spatialdata.py#line=1272), in SpatialData.add_shapes(self, name, shapes, overwrite)
   1267 def add_shapes(
   1268     self,
   1269     name: str,
   1270     shapes: GeoDataFrame,
   1271     overwrite: bool = False,
   1272 ) -> None:
-> 1273     _error_message_add_element()

File [~/miniconda3/envs/sopa/lib/python3.10/site-packages/spatialdata/_utils.py:281](http://10.72.42.72:1237/lab/tree/PMLAB/USERS/Active/Chuang_DONG/scRNAseq/cosmx_230417_h_AITL/2_Analysis/V1/SpatialData/~/miniconda3/envs/sopa/lib/python3.10/site-packages/spatialdata/_utils.py#line=280), in _error_message_add_element()
    280 def _error_message_add_element() -> None:
--> 281     raise RuntimeError(
    282         "The functions add_image(), add_labels(), add_points() and add_shapes() have been removed in favor of "
    283         "dict-like access to the elements. Please use the following syntax to add an element:\n"
    284         "\n"
    285         '\tsdata.images["image_name"] = image\n'
    286         '\tsdata.labels["labels_name"] = labels\n'
    287         "\t...\n"
    288         "\n"
    289         "The new syntax does not automatically updates the disk storage, so you need to call sdata.write() when "
    290         "the in-memory object is ready to be saved.\n"
    291         "To save only a new specific element to an existing Zarr storage please use the functions write_image(), "
    292         "write_labels(), write_points(), write_shapes() and write_table(). We are going to make these calls more "
    293         "ergonomic in a follow up PR."
    294     )

RuntimeError: The functions add_image(), add_labels(), add_points() and add_shapes() have been removed in favor of dict-like access to the elements. Please use the following syntax to add an element:

    sdata.images["image_name"] = image
    sdata.labels["labels_name"] = labels
    ...

The new syntax does not automatically updates the disk storage, so you need to call sdata.write() when the in-memory object is ready to be saved.
To save only a new specific element to an existing Zarr storage please use the functions write_image(), write_labels(), write_points(), write_shapes() and write_table(). We are going to make these calls more ergonomic in a follow up PR.

My debugging refer to stainings.py line 166

print(sopa.__version__)

1.0.7

cells = shapes.geometrize(sdata["1_labels"].values)
len(cells) # 2303
cells[:10]

[<POLYGON ((778 3632.364, 778.598 3640.496, 787.364 3646.821, 829.636 3646.82...>, <POLYGON ((944 3634.364, 944.272 3641.887, 949.125 3646.977, 1012.885 3646.9...>, <POLYGON ((1590.96 3619.83, 1603.364 3646.821, 1661.636 3646.821, 1664.934 3...>, <POLYGON ((1666 3629.845, 1665.063 3639.188, 1669.722 3646.917, 1710.912 364...>, <POLYGON ((1817 3613.364, 1817.598 3621.496, 1831.2 3635.224, 1836.917 3645....>, <POLYGON ((2188.519 3614.614, 2188.083 3633.278, 2194.247 3645.221, 2228.196...>, <POLYGON ((2228.519 3600.614, 2231.292 3646.196, 2283.636 3646.821, 2294.402...>, <POLYGON ((2505 3635.615, 2505.42 3642.2, 2511.088 3646.976, 2614.387 3647, ...>, <POLYGON ((2607.497 3617.342, 2622.181 3646.979, 2713.278 3646.917, 2717.688...>, <POLYGON ((3364.002 3624.497, 3367.272 3641.887, 3372.364 3646.821, 3424.912...>]

image = get_spatial_image(sdata, "1_image")
image

Screenshot from 2024-03-29 15-53-32

geo_df = gpd.GeoDataFrame({"geometry": cells})
geo_df.index = "1_image" + geo_df.index.astype(str)
geo_df = ShapesModel.parse(geo_df, transformations=get_transformation(image, get_all=True))
geo_df

Screenshot from 2024-03-29 15-55-46

sdata.add_shapes("1_shapes", geo_df, overwrite=True)

In the final, I got same error messages. I don't understand why add_shapes return None.

quentinblampey commented 6 months ago

Which version of spatialdata are you using?

I may have created this issue by authorizing spatialdata>=0.1.0... Can you try older versions of spatialdata, like 0.0.16?

I'll dig through it on Wednesday

Chuang1118 commented 6 months ago

Previous Results : sopa == 1.0.7 & spatialdata==0.1.1

Today, I have installed the older releases sopa. i.e. v1.0.6, v1.0.5 and v1.0.1, I have got issue#37.

Then, I have installed spatialdata==0.0.16 based on sopa == 1.0.7

pip install --force-reinstall -v "spatialdata==0.0.16"

there are some warnings: sopa 1.0.7 requires botocore==1.34.19, but you have botocore 1.31.17 which is incompatible. sopa 1.0.7 requires spatialdata>=0.1.1, but you have spatialdata 0.0.16 which is incompatible.

I am following your given snippet. I have modified import sopa.io to import sopa, because import sopa.io I got issue#37.

ImportError                               Traceback (most recent call last)
Cell In[24], line 1
----> 1 import sopa.io

File [~/miniconda3/envs/sopa/lib/python3.10/site-packages/sopa/io/__init__.py:4]
      2 from .explorer import write, align
      3 from .standardize import write_standardized
----> 4 from .transcriptomics import merscope, xenium, cosmx
      5 from .histopathology import wsi, wsi_autoscale
      6 from .report import write_report

File [~/miniconda3/envs/sopa/lib/python3.10/site-packages/sopa/io/transcriptomics.py:18]
     16 import dask.dataframe as dd
     17 import numpy as np
---> 18 import spatialdata_io
     19 import xarray
     20 from dask import array as da

File [~/miniconda3/envs/sopa/lib/python3.10/site-packages/spatialdata_io/__init__.py:12]
     10 from spatialdata_io.readers.visium import visium
     11 from spatialdata_io.readers.visium_hd import visium_hd
---> 12 from spatialdata_io.readers.xenium import (
     13     xenium,
     14     xenium_aligned_image,
     15     xenium_explorer_selection,
     16 )
     18 __all__ = [
     19     "curio",
     20     "visium",
   (...)
     30     "visium_hd",
     31 ]
     33 __version__ = version("spatialdata-io")

File [~/miniconda3/envs/sopa/lib/python3.10/site-packages/spatialdata_io/readers/xenium.py:32]
     30 from spatial_image import SpatialImage
     31 from spatialdata import SpatialData
---> 32 from spatialdata._core.query.relational_query import _get_unique_label_values_as_index
     33 from spatialdata._types import ArrayLike
     34 from spatialdata.models import (
     35     Image2DModel,
     36     Labels2DModel,
   (...)
     39     TableModel,
     40 )

ImportError: cannot import name '_get_unique_label_values_as_index' from 'spatialdata._core.query.relational_query'

There is very strange import sopa works only time, I got sdata like below:

SpatialData object with:
├── Images
│     └── '1_image': SpatialImage[cyx] (3, 3648, 5472)
├── Labels
│     └── '1_labels': SpatialImage[yx] (3648, 5472)
├── Points
│     └── '1_points': DataFrame with shape: (<Delayed>, 9) (2D points)
├── Shapes
│     └── '1_shapes': GeoDataFrame shape: (7797, 1) (2D shapes)
└── Table
      └── AnnData object with n_obs × n_vars = 7797 × 1010
    obs: 'region', 'slide', 'cell_id', 'area'
    uns: 'sopa_attrs', 'spatialdata_attrs'
    obsm: 'spatial': AnnData (7797, 1010)
with coordinate systems:
▸ '1', with elements:
        1_image (Images), 1_labels (Labels), 1_points (Points), 1_shapes (Shapes)
▸ 'global', with elements:
        1_image (Images), 1_labels (Labels), 1_points (Points), 1_shapes (Shapes)
▸ 'global_only_image', with elements:
        1_image (Images), 1_shapes (Shapes)
▸ 'global_only_labels', with elements:
        1_labels (Labels), 1_points (Points)

This result is not reproducible, because I have re-run import sopa, I have got issue#37. and I can't write xemium-like format due to import spatialdata_io issue.

Chuang1118 commented 6 months ago

hi, In my opinion, since

RuntimeError: The functions add_image(), add_labels(), add_points() and add_shapes() have been removed in favor of dict-like access to the elements. Please use the following syntax to add an element:

    sdata.images["image_name"] = image
    sdata.labels["labels_name"] = labels
    ...

by using the new syntax e.g. sdata.shapes["1_shapes"]=geo_df, in this way, I can update the attribute. This issue has also an impact on aggregate.py aggr.update_table(gene_column="target", average_intensities=False)

would you tell me more about these 3 lines of code, how to rebuild Table(Anndata)?

del sdata.table # we delete the table because we will re-compute a new table below

### Compute a new table
aggr = Aggregator(sdata, image_key="1_image", shapes_key="1_shapes")
aggr.update_table(gene_column="target", average_intensities=False)
quentinblampey commented 6 months ago

Hello @Chuang1118,

All these issues were related to recent SpatialData updates, made just before their paper's publication. I needed to update some things in sopa, which should now be stable if you update to sopa==1.0.9.

So, if you run pip install sopa --upgrade, you should now have sopa==1.0.9, spatialdata>=0.1.2 and spatialdata-io>=0.1.2

The snippet above remains the same! Let me know if you are still having issues. Sorry again for these instabilities, SpatialData should now be more stable, and so will be Sopa 🙂

NB: sopa is up-to-date, but spatialdata_xenium_explorer also needs updates. In the meantime, please stay with sopa (the conversion to the xenium explorer is the same).

Chuang1118 commented 6 months ago

Hello,

I confirm that this is resolved by sopa==1.0.9. Then I can use Xenium Explorer version 2.0.0 to visualize experiment.xenium.

Thanks again for developing such a wonderful tool.

quentinblampey commented 6 months ago

Great!

Note that Sopa supports interoperability with the Xenium Explorer:

All this will be better detailed in a future tutorial :)