scverse / spatialdata

An open and interoperable data framework for spatial omics data
https://spatialdata.scverse.org/
BSD 3-Clause "New" or "Revised" License
226 stars 42 forks source link

The SpatialData object is not self-contained #710

Open Felicie-Giraud-Sauveur opened 2 weeks ago

Felicie-Giraud-Sauveur commented 2 weeks ago

Hello,

I am contacting you about the “not self-contained” message when saving sdata to a new location. Here is the example:

import spatialdata as sd
from spatialdata.datasets import blobs

sdata = blobs()
sdata.write("/Volumes/One Touch/MICS/data_HE2CellType/CT_DS/test_blobs.zarr")

sdata = sd.read_zarr("/Volumes/One Touch/MICS/data_HE2CellType/CT_DS/test_blobs.zarr")

sdata.table.obs['test'] = 'test'
sdata.table.obs.head()

sdata.write("/Volumes/One Touch/MICS/data_HE2CellType/CT_DS/test_2_blobs.zarr")

And it outputs :

INFO     The SpatialData object is not self-contained (i.e. it contains some elements that are Dask-backed from    
         locations outside [/Volumes/](https://file+.vscode-resource.vscode-cdn.net/Volumes/)One Touch/MICS/data_HE2CellType/CT_DS/test_2_blobs.zarr). Please see the       
         documentation of `is_self_contained()` to understand the implications of working with SpatialData objects 
         that are not self-contained.                                                                              
INFO     The Zarr backing store has been changed from [/Volumes/](https://file+.vscode-resource.vscode-cdn.net/Volumes/)One                                                 
         Touch/MICS/data_HE2CellType/CT_DS/test_blobs.zarr the new file path: [/Volumes/](https://file+.vscode-resource.vscode-cdn.net/Volumes/)One                         
         Touch/MICS/data_HE2CellType/CT_DS/test_2_blobs.zarr

I was wondering if in this case, if I completely delete test_blobs.zarr from my disk, can I lose information in test_2_blobs.zarr or have a problem afterwards? I am having trouble understanding the implications of being “not self-contained”.

Thanks in advance for your help!

LucaMarconato commented 16 hours ago

Hi, the problem here can be investigated by printing the sdata object after write. Here is an example (see the bottom part):

SpatialData object, with associated Zarr store: /Users/macbook/temp/test_blobs.zarr2
├── Images
│     ├── 'blobs_image': DataArray[cyx] (3, 512, 512)
│     └── 'blobs_multiscale_image': DataTree[cyx] (3, 512, 512), (3, 256, 256), (3, 128, 128)
├── Labels
│     ├── 'blobs_labels': DataArray[yx] (512, 512)
│     └── 'blobs_multiscale_labels': DataTree[yx] (512, 512), (256, 256), (128, 128)
├── Points
│     └── 'blobs_points': DataFrame with shape: (<Delayed>, 4) (2D points)
├── Shapes
│     ├── 'blobs_circles': GeoDataFrame shape: (5, 2) (2D shapes)
│     ├── 'blobs_multipolygons': GeoDataFrame shape: (2, 1) (2D shapes)
│     └── 'blobs_polygons': GeoDataFrame shape: (5, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (26, 3)
with coordinate systems:
    ▸ 'global', with elements:
        blobs_image (Images), blobs_multiscale_image (Images), blobs_labels (Labels), blobs_multiscale_labels (Labels), blobs_points (Points), blobs_circles (Shapes), blobs_multipolygons (Shapes), blobs_polygons (Shapes)
with the following Dask-backed elements not being self-contained:
    ▸ blobs_image: /Users/macbook/temp/test_blobs.zarr/images/blobs_image
    ▸ blobs_multiscale_image: /Users/macbook/temp/test_blobs.zarr/images/blobs_multiscale_image
    ▸ blobs_labels: /Users/macbook/temp/test_blobs.zarr/labels/blobs_labels
    ▸ blobs_multiscale_labels: /Users/macbook/temp/test_blobs.zarr/labels/blobs_multiscale_labels
    ▸ blobs_points: /Users/macbook/temp/test_blobs.zarr/points/blobs_points/points.parquet/part.0.parquet

Basically, the images, labels and points that have been read still refer to the old Zarr location. To fix, you can simply read again the object from the new disk location.

I will try to think of a way to make this info message less obscure, maybe by asking the user to read agian the object if they want to have a self-contained object.

Please let me know if you have additional questions on this!