Closed giovp closed 5 months ago
@TomNicholas what do you think?
Hi, thanks for making me aware of what you're doing here!
So DataTree was not intended to be subclassed. In fact, we currently recommend against subclassing xarray objects in general, and this this recommendation really applies to datatree too because it follows similar patterns internally.
It's worth asking why you are subclassing? Looking at the definition of MultiscaleSpatialImage
here, it looks like MultiscaleSpatialImage
only has one extra method added to DataTree
. Adding extra methods is what xarray accessors allow you to do (without subclassing), and datatree does support those accessors.
If you really want to subclass it has been done, and if you want to improve the experience of subclassing xarray objects we would love someone to take that project on upstream.
@TomNicholas thank you for the thoughts and pointers!
What we are looking for is:
MultiscaleSpatialImage class
)to_zarr
functionalityFrom what you shared it sounds like an accessor would be better, and appropriate, addressing https://github.com/scverse/spatialdata/issues/269, etc.? Will 1. the type identification
, still be available? Through copies, etc.? Then type identification is isinstance(i, DataTree) and hasattr(i, 'msi')
? @giovp @LucaMarconato will this work for type identification in spatialdata?
Thanks for the discussion. I think this would be a good solution and it would be compatible with the type identification used in spatialdata. In fact in our case we can replace all the MultiscaleSpatialImage
with DataTree
and eventually all the SpatialImage
with DataArray
, so as long as we can still call validate()
on the objects, everything should work.
One note, I find a bit inconsistent that the scales of a MultiscaleSpatialImage
object are DataArray
and not SpatialImage
. Also a user reported this. I think that the considerations on inheriting xarray classes should also apply to SpatialImage
, and therefore I would consider using DataArray
with accessors.
thanks all for the discussion, agree with @LucaMarconato that type consistency across spatial-image project would be useful, seems like an accessor could be the best solution.
Then type identification is isinstance(i, DataTree) and hasattr(i, 'msi')
yes indeed, I think we might end up refactor the models to infer type based on data type + number/type of dimensions.
Will 1. the type identification, still be available? Through copies, etc.? Then type identification is
isinstance(i, DataTree)
andhasattr(i, 'msi')
?
@thewtex no I don't think this would work - when the accessor is registered it becomes available on all DataTree
objects, so hasattr(i, 'msi')
would not distinguish anything useful.
In fact in our case we can replace all the
MultiscaleSpatialImage
withDataTree
and eventually all theSpatialImage
withDataArray
, so as long as we can still callvalidate()
on the objects, everything should work.
@LucaMarconato this seems to line up with what's currently recommended. :+1:
infer type based on data type + number/type of dimensions.
This would of course still work. (You wouldn't be technically inferring "Type", but you could use it to distinguish valid from invalid datasets.)
@LucaMarconato @TomNicholas excellent, thank you for the thoughts! Let's migrate to an accessor, then. @giovp already has a draft in #81 -- please take a look. :sparkler: :rocket: :metal:
I think that the considerations on inheriting xarray classes should also apply to SpatialImage, and therefore I would consider using DataArray with accessors.
:+1: agreed.
This issue occurs currently also when using pytest fixtures.
Pytest passes the return value of a fixture function (1) directly to test functions using the fixture. However, when another fixture (2) depends on fixture (1), it receives a deepcopy, where multiscale images are suddenly datatrees, and fail dispatching in overloaded functions.
In case someone else encounters this, here my workaround that I apply in the test function on the fixture argument:
def workaround_pytest_fixture_breaking_multiscale_spatial_image(sdata: SpatialData) -> SpatialData:
# When a fixture depends on another fixture which returns a MultiscaleSpatialImage instance,
# the first fixture receives a DataTree instead. The object looses the ability to be recognized
# by isinstance(obj, MultiscaleSpatialImage).
# Maybe this is caused by a copy/deepcopy operation.
for name, image in sdata.images.items():
if isinstance(image, DataTree) and not isinstance(image, MultiscaleSpatialImage):
repaired_image = MultiscaleSpatialImage.from_dict(image.to_dict())
sdata.images[name] = repaired_image
for name, labels in sdata.labels.items():
if isinstance(labels, DataTree) and not isinstance(labels, MultiscaleSpatialImage):
repaired_labels = MultiscaleSpatialImage.from_dict(labels.to_dict())
sdata.labels[name] = repaired_labels
return sdata
@aeisenbarth thanks for the report :+1:
In your use case, is multiscale-spatial-image
1.0.0
used? It incorporates #81 , which changes the inheritance and instance identification as described at the top of the PR.
Regardless, we should probably also have tests for the copy/deepcopy operation.
I just updated to multiscale-spatial-image 1.0.0, but the issue persists with latest spatialdata main branch.
Image2DModel.parse(…, scale_factors=…)
calls to_multiscale
and passes the DataTree (with new msi
attribute) to spatialdata.transformations._utils:_get_transformations
, but that can only dispatch on MultiscaleSpatialImage and does not handle DataTree. I believe just the dispatching needs to be updated everywhere in spatialdata.
hi @aeisenbarth , indeed we haven't completed the migration to msi 1.0.0 yet (unfortunately). From your message I thought it was a behaviour of the pytest fixture but I haven't investigated further.
hi @aeisenbarth I think the latest sptialdata (on pypi) supports this (thanks to https://github.com/scverse/spatialdata/pull/594 ). I will close this for now but feel free to reopen!
Also @aeisenbarth I have implemented a deepcopy
operation that works for all the spatial elements and handle both data and metadata, you can use it with from spatialdata import deepcopy
.
There are several operations that do not return a
MultiscaleSpatialImage
class but adatatree.DataTree
. e.g.I've been playing around with subclassing differently
DataTree
without success. I wonder what's the best solution for this? I think it should either always return aMultiscaleSpatialImage
or always aDataTree
(andMultiscaleSpatialImage
would then be a parser class, but IMHOSpatialImage
should then be consistent, and it doesn't suffer from such issues).xref https://github.com/scverse/spatialdata/issues/269