scverse / spatialdata

An open and interoperable data framework for spatial omics data
https://spatialdata.scverse.org/
BSD 3-Clause "New" or "Revised" License
174 stars 34 forks source link

Check for uniqueness of names also across the file storage #504

Open LucaMarconato opened 1 month ago

LucaMarconato commented 1 month ago

TLDR;

When #398 is addressed, we can remove _check_element_not_on_disk_with_different_type() as it will not be needed anymore.

Full issue

When adding an element in-memory to a SpatialData object, the mechanism around self._shared_keys in Elements doesn't check (by design) if an on-disk an element with the same name is already available. This gives the user more flexibility when replacing an element on-disk with a new one in-memory.

Nevertheless, due the fact that we assume unique names for objects of different types, the above in-memory check doesn't guarantee that with the new incremental IO APIs the user doesn't add on-disk an element that is already present with the same name in another group. For instance on disk there would be images/my_image, the user could have in-memory only labels/my_image and save it on disk. This would make the object corrupted.

Currently I perform the check that the element name that is being added/deleted/changed (metadata, transformations) doesn't refer to an element on-disk with a different type with _check_element_not_on_disk_with_different_type(). Anyway, in general I believe that we should not complicate the code with these checks and address the problem at the root by fixing https://github.com/scverse/spatialdata/issues/398.