Open thewtex opened 2 years ago
@thewtex, what's the relationship here to #149?
@ivirshup: my understanding is that this is solely focused on adding the schema for the previous version (0.4). I assume you might have the identical change for /latest
?
@joshmoore ah, hadn't noticed the difference in files changed. Thanks for the heads up.
No objections on a technical level. I think we initially decided that we don't need an explicit representation of the identity, are there use-cases where we need it now?
This came up when coordinateTransformations: []
was present. The schema currently complains that there must be at least one transformation. Additionally, we could also update the schema so that the list can be empty. However, I think the approach to list an Identity transformation is a good, explicit way to indicate that this is intentional.
@thewtex, what's the relationship here to #149? @ivirshup: my understanding is that this is solely focused on adding the schema for the previous version (0.4).
Correct, the identity
transformation already exists in the documentation for v0.4, but it is not supported by the schema.
Could we instead remove identity from the spec and modify the schema to get rid of the requirement that there be at least one coordinate transformation? Requiring that there be an empty array (as opposed to not including the field) is enough to remind people that there is meant to be some mapping from data to world coordinates. It seems like (literally) needless API surface.
@clbarnes that would be a functional way to indicate that there is an identity coordinateTransformation, too. Removing identity would be a change to the already published spec, though.
I would like one of the proposed solutions, a) support for an explicit identity transformation, b) support for an empty coordinateTransformation[]
, that address the situation of an identity transformation to be integrated so I can use the schema with v0.4 datasets. @constantinpape @joshmoore do you have any guidance on this?
A required, empty coordinateTransformation[]
is a good idea, but, as an additional requirement, probably a candidate for v0.5.
@bogovicj what will work best and what do you recommend considering the current v0.5 draft?
do you have any guidance on this?
My tendency would be to do the thing most like the 0.4 spec text for the moment.
My tendency would be to do the thing most like the 0.4 spec text for the moment.
Since this is following the 0.4 spec, can we merge this? And make additional improvements as needed in 0.5?
Since this is following the 0.4 spec
By my reading, the identity transform is not allowed anywhere in v0.4, even though it is defined. From the 0.4 schema:
Each "datasets" dictionary ... The transformations are defined according to § 3.3 "coordinateTransformations" metadata. The transformation MUST only be of type translation or scale.
Each "multiscales" dictionary ... . The transformations MUST follow the same rules about allowed types, order, etc. as in "datasets:coordinateTransformations" and are applied after them.
The same is true in latest
. The thing most like the 0.4 text, then, is to just scrub the single mention of the identity transform which can't be included in the metadata anyway.
As written, the way to express an identity transform is a trivial scale (a single scale is required) so that's probably the right place here). If that wasn't explicit enough, I think support for a zero-length (but extant) coordinateTransformations
array gets the point across, while also simplifying the JSON structure (fewer characters), the schema (no length constraints, no required scale), and the spec (fewer transforms, no required scale, and fewer optional keys).
Since this is following the 0.4 spec
By my reading, the identity transform is not allowed anywhere in v0.4, even though it is defined. From the 0.4 schema:
Each "datasets" dictionary ... The transformations are defined according to § 3.3 "coordinateTransformations" metadata. The transformation MUST only be of type translation or scale.
Each "multiscales" dictionary ... . The transformations MUST follow the same rules about allowed types, order, etc. as in "datasets:coordinateTransformations" and are applied after them.
The same is true in
latest
. The thing most like the 0.4 text, then, is to just scrub the single mention of the identity transform which can't be included in the metadata anyway.As written, the way to express an identity transform is a trivial scale (a single scale is required) so that's probably the right place here). If that wasn't explicit enough, I think support for a zero-length (but extant)
coordinateTransformations
array gets the point across, while also simplifying the JSON structure (fewer characters), the schema (no length constraints, no required scale), and the spec (fewer transforms, no required scale, and fewer optional keys).
+1 to this. I don't think an explicit identity transformation can make any sense in 0.4 without making a lot of changes to the spec.
The 0.4 spec does include the identity specification:
and this PR ensures it is in the schema.
A scale consisting of 1.0's is not the same -- interpreting software handles it differently. And identity means there are not extra computations.
A zero-length coordinateTransformation
is not allowed in the 0.4 schema.
This backward-compatible update to the schema ensures that the text and the schema are consistent.
Removing a transformation in the spec is not backwards compatible.
@thewtex if I understand things correctly, there has never been a way to use the Identity transform in v0.4, because of the text here and here. In the two places were the spec describes uses of coordinateTransformations
, it explicitly says that only scale
and translation
are allowed.
To rectify this there are two options. The first option, supported by me and @clbarnes, is to remove references to the Identity transform completely, and update the spec to state that an empty collection of coordinateTransformations
is to be interpreted as expressing an identity transformation. This would technically not be a breaking change, because it was impossible to use the identity transform anyway.
The other option is this PR, but this still requires additional changes to the spec: we would need to remove this sentence to allow use of an explicit identity transform. Also, unlike removing the explicit identity transform, your PR would be a breaking change if you add support for the identity transform in multiscales
-- in v0.4, only scale and translation are allowed, so after this change, ome-ngff readers will need to be updated to handle datasets using the identity transform.
The 0.4 spec does include the identity specification:
As I said in my comment https://github.com/ome/ngff/pull/152#issuecomment-1638106191 , it's clearly defined, but it's not allowed. Or at least, I couldn't find anywhere that you're actually allowed to include an identity transformation - I'm very happy to be corrected. The only places coordinateTransforms are used is in the multiscale definition; however, in multiscales, coordinateTransformations MUST either be a scale or a translation, i.e. must not be an identity (as quoted in the linked comment).
So exactly as Davis says, we know what an identity transform is, but we're not allowed to use it. No current implementation can correctly support identity transforms because there is nowhere they can use them while being compliant with the spec. Our options, then, are
An extension to 2 would be to update the schema to allow zero-length coordinatetransformations lists, where currently we require a trivial scale. This would update the schema (making it simpler), and possibly require updating implementations (making them simpler, and in a backwards-compatible way, because no existing spec-compliant data has identity transforms and implementations will continue to be able to read trivial scales).
The transformations are defined according to [[#trafo-md]]. The transformation MUST only be of type
translation
orscale
.
This line is inconsistent. [[#trafo-md]]
refers to identity
, translation
, and scale
.
Identity transforms serve a purpose -- they indicate that the transformation is the identity. This is different than an undefined or unknown transformation.
There are inconsistencies with the specification, which this patch looks to resolve. I will push an update so the transformation list is consistent. The spec does include an identity transformation. These lines do exist:
https://github.com/ome/ngff/blob/99b8765c0f1b9aa6072b030c8f15f75c082a1e5e/0.4/index.bs#L337 https://github.com/ome/ngff/blob/99b8765c0f1b9aa6072b030c8f15f75c082a1e5e/0.4/index.bs#L341
And there are implementations that use it, which is why I created this patch. If there are implementations that do not support identity
, I am happy to create patches for them.
Thanks @thewtex for removing that inconsistent sentence.
Identity transforms serve a purpose -- they indicate that the transformation is the identity. This is different than an undefined or unknown transformation.
I don't think anyone is opposing identity transforms. I think the question is how they should be represented in the spec.
What are the advantages of an identity transform expressed as an object, vs representing the identity transform implicitly as an empty list? Speaking for myself, I like the idea of keeping the spec simple -- making the identity transform implicit rather than explicit would allow me to delete some code, which is a small win. But perhaps there is some advantage to the explicit identity transform that I'm not aware of?
There are inconsistencies with the specification, which this patch looks to resolve.
Right, but there are 2 ways to resolve the inconsistency. This resolution (adding references to identity in several places) requires changes to spec-compliant implementations, and increases complexity of the spec. The alternative resolution (dropping the one reference to identity) reduces complexity of the spec and requires no implementation changes to spec-compliant implementations: we shouldn't restrict changes on behalf of things not compliant with the spec, a colossal set including "the colour blue" and "the concept of fraternal love". I'm very on board with explicit representations of not-transforms, as opposed to unknown or undefined transforms; in my opinion, an empty list of transforms very clearly says "do not apply any transforms". A nullable or possibly-undefined list of transforms (e.g. as currently supported in the multiscales representation) is what should be avoided.
This has been dormant for a few months. How can we move this forward?
What are the advantages of an identity transform expressed as an object, vs representing the identity transform implicitly as an empty list?
When transformations are components of a multi-modal registration analysis workflow, there are often a chain of transformations that represent stages of registration, e.g. a calibration step, a re-orientation step, a template registration step, an atlas registration step. For some modalities or datasets in an analysis, a step may be the identity transformation, and it is useful to indicate that. An empty list does not capture that.
Right, but there are 2 ways to resolve the inconsistency. This resolution (adding references to identity in several places) requires changes to spec-compliant implementations
If the spec calls out "identity", which it does, an implementation that does not support "identity" is not "spec-compliant".
If the spec calls out "identity", which it does, an implementation that does not support "identity" is not "spec-compliant".
I feel like we're going around in circles a bit. It is correct to say that the current spec includes the identity transformation. However, the spec does not allow the use of that transformation. Therefore, any implementation which allows the use of the identity transformation is not spec-compliant. So while it's possible for implementations to have an identity transformation defined, if they can parse it from the metadata, or if they can write it into the metadata, they are not compliant with the spec. Therefore, there are zero functional changes associated with removal of the identity transform, it simply allows the removal of code which must be unreachable according to the current spec. Meanwhile, allowing the use of the identity transform requires changes which hook an implementation's representation of that transform into the parsing and transforming workflow.
Does this make sense? I feel like I may not be getting my point across very clearly.
I see your point about explicit transformations allowing logic like "In my particular workflow, the 3rd transformation is always the one generated by process X, and therefore if X generates an identity transform, I must be able to represent it in the transformation array". That's totally reasonable! However, I'm not sure it's necessarily compatible with the current spec, which limits the number and order of transformations (one scale, and up to one translation, which must occur after the scale). Unless your registration workflow is specifically designed to fit to the OME-NGFF spec (which it might be) I can see it easily breaking those rules.
@clbarnes I understand that you are trying to assert that the spec does not allow identity transformations, and an implementation with an identity transform is not spec-compliant. This assertion was made repeatedly. I understand your point.
However, it does not make sense that the spec states that transforms supported include the identity
transform, and an identity
transform is not allowed.
Clearly, clarifications are required. Perhaps this PR should be integrated into the v0.5 update?
Automated Review URLs