spatial-image / multiscale-spatial-image

Generate a multiscale, chunked, multi-dimensional spatial image data structure that can serialized to OME-NGFF.
Apache License 2.0
38 stars 8 forks source link

multiscale class not being inherited correctly #80

Closed giovp closed 5 months ago

giovp commented 1 year ago

There are several operations that do not return a MultiscaleSpatialImage class but a datatree.DataTree. e.g.

import numpy as np
from spatial_image import to_spatial_image
from multiscale_spatial_image import to_multiscale
from copy import deepcopy

array = np.random.randint(0, 256, size=(128,128), dtype=np.uint8)
image = to_spatial_image(array)
multiscale = to_multiscale(image, [2, 4])

deepcopy(multiscale)
>>> datatree.DataTree
multiscale.copy()
>>> datatree.DataTree
multiscale.copy(deep=True)
>>> datatree.DataTree
multiscale.mean(dim="x")
>>> datatree.DataTree
multiscale.sel(x=10, method="nearest")
>>> datatree.DataTree

I've been playing around with subclassing differently DataTree without success. I wonder what's the best solution for this? I think it should either always return a MultiscaleSpatialImage or always a DataTree (and MultiscaleSpatialImage would then be a parser class, but IMHO SpatialImage should then be consistent, and it doesn't suffer from such issues).

xref https://github.com/scverse/spatialdata/issues/269

thewtex commented 1 year ago

@TomNicholas what do you think?

TomNicholas commented 1 year ago

Hi, thanks for making me aware of what you're doing here!

So DataTree was not intended to be subclassed. In fact, we currently recommend against subclassing xarray objects in general, and this this recommendation really applies to datatree too because it follows similar patterns internally.

It's worth asking why you are subclassing? Looking at the definition of MultiscaleSpatialImage here, it looks like MultiscaleSpatialImage only has one extra method added to DataTree. Adding extra methods is what xarray accessors allow you to do (without subclassing), and datatree does support those accessors.

If you really want to subclass it has been done, and if you want to improve the experience of subclassing xarray objects we would love someone to take that project on upstream.

thewtex commented 1 year ago

@TomNicholas thank you for the thoughts and pointers!

What we are looking for is:

  1. the type identification (MultiscaleSpatialImage class)
  2. the to_zarr functionality
  3. possible additional functionality, attributes in the future

From what you shared it sounds like an accessor would be better, and appropriate, addressing https://github.com/scverse/spatialdata/issues/269, etc.? Will 1. the type identification, still be available? Through copies, etc.? Then type identification is isinstance(i, DataTree) and hasattr(i, 'msi')? @giovp @LucaMarconato will this work for type identification in spatialdata?

LucaMarconato commented 1 year ago

Thanks for the discussion. I think this would be a good solution and it would be compatible with the type identification used in spatialdata. In fact in our case we can replace all the MultiscaleSpatialImage with DataTree and eventually all the SpatialImage with DataArray, so as long as we can still call validate() on the objects, everything should work.

One note, I find a bit inconsistent that the scales of a MultiscaleSpatialImage object are DataArray and not SpatialImage. Also a user reported this. I think that the considerations on inheriting xarray classes should also apply to SpatialImage, and therefore I would consider using DataArray with accessors.

giovp commented 1 year ago

thanks all for the discussion, agree with @LucaMarconato that type consistency across spatial-image project would be useful, seems like an accessor could be the best solution.

Then type identification is isinstance(i, DataTree) and hasattr(i, 'msi')

yes indeed, I think we might end up refactor the models to infer type based on data type + number/type of dimensions.

TomNicholas commented 1 year ago

Will 1. the type identification, still be available? Through copies, etc.? Then type identification is isinstance(i, DataTree) and hasattr(i, 'msi')?

@thewtex no I don't think this would work - when the accessor is registered it becomes available on all DataTree objects, so hasattr(i, 'msi') would not distinguish anything useful.

In fact in our case we can replace all the MultiscaleSpatialImage with DataTree and eventually all the SpatialImage with DataArray, so as long as we can still call validate() on the objects, everything should work.

@LucaMarconato this seems to line up with what's currently recommended. :+1:

infer type based on data type + number/type of dimensions.

This would of course still work. (You wouldn't be technically inferring "Type", but you could use it to distinguish valid from invalid datasets.)

thewtex commented 1 year ago

@LucaMarconato @TomNicholas excellent, thank you for the thoughts! Let's migrate to an accessor, then. @giovp already has a draft in #81 -- please take a look. :sparkler: :rocket: :metal:

I think that the considerations on inheriting xarray classes should also apply to SpatialImage, and therefore I would consider using DataArray with accessors.

:+1: agreed.

aeisenbarth commented 11 months ago

This issue occurs currently also when using pytest fixtures.

Pytest passes the return value of a fixture function (1) directly to test functions using the fixture. However, when another fixture (2) depends on fixture (1), it receives a deepcopy, where multiscale images are suddenly datatrees, and fail dispatching in overloaded functions.

In case someone else encounters this, here my workaround that I apply in the test function on the fixture argument:

def workaround_pytest_fixture_breaking_multiscale_spatial_image(sdata: SpatialData) -> SpatialData:
    # When a fixture depends on another fixture which returns a MultiscaleSpatialImage instance,
    # the first fixture receives a DataTree instead. The object looses the ability to be recognized
    # by isinstance(obj, MultiscaleSpatialImage).
    # Maybe this is caused by a copy/deepcopy operation.
    for name, image in sdata.images.items():
        if isinstance(image, DataTree) and not isinstance(image, MultiscaleSpatialImage):
            repaired_image = MultiscaleSpatialImage.from_dict(image.to_dict())
            sdata.images[name] = repaired_image
    for name, labels in sdata.labels.items():
        if isinstance(labels, DataTree) and not isinstance(labels, MultiscaleSpatialImage):
            repaired_labels = MultiscaleSpatialImage.from_dict(labels.to_dict())
            sdata.labels[name] = repaired_labels
    return sdata
thewtex commented 11 months ago

@aeisenbarth thanks for the report :+1:

In your use case, is multiscale-spatial-image 1.0.0 used? It incorporates #81 , which changes the inheritance and instance identification as described at the top of the PR.

Regardless, we should probably also have tests for the copy/deepcopy operation.

aeisenbarth commented 11 months ago

I just updated to multiscale-spatial-image 1.0.0, but the issue persists with latest spatialdata main branch. Image2DModel.parse(…, scale_factors=…) calls to_multiscale and passes the DataTree (with new msi attribute) to spatialdata.transformations._utils:_get_transformations, but that can only dispatch on MultiscaleSpatialImage and does not handle DataTree. I believe just the dispatching needs to be updated everywhere in spatialdata.

giovp commented 11 months ago

hi @aeisenbarth , indeed we haven't completed the migration to msi 1.0.0 yet (unfortunately). From your message I thought it was a behaviour of the pytest fixture but I haven't investigated further.

giovp commented 5 months ago

hi @aeisenbarth I think the latest sptialdata (on pypi) supports this (thanks to https://github.com/scverse/spatialdata/pull/594 ). I will close this for now but feel free to reopen!

LucaMarconato commented 5 months ago

Also @aeisenbarth I have implemented a deepcopy operation that works for all the spatial elements and handle both data and metadata, you can use it with from spatialdata import deepcopy.