ome / ome-model

OME model (specification, code generator, implementation)
Other
13 stars 25 forks source link

XMLMockObjects: make artificial SHA1 schema-compliant #158

Closed melissalinkert closed 1 year ago

melissalinkert commented 2 years ago

See https://github.com/ome/bioformats/issues/3810

sbesson commented 2 years ago

To my original surprise, this change failed the unit tests in https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-image/1151 with the following validation error

2022-04-18 00:17:05,448 [main] ERROR loci.common.xml.XMLTools - cvc-length-valid: Value '1234567890ABCDEF1234' with length = '10' is not facet-valid with respect to length '20' for type 'Hex40'.
2022-04-18 00:17:05,449 [main] ERROR loci.common.xml.XMLTools - cvc-type.3.1.3: The value '1234567890ABCDEF1234' of element 'HashSHA1' is not valid.

After some investigation, I suspect the explanation is that the HashSHA1 element is of Hex40 type defined as

https://github.com/ome/ome-model/blob/e6efb8802492880f8fde7be99c1c9b83be215e76/specification/src/main/resources/released-schema/2016-06/ome.xsd#L1467-L1476

So the 20 length restriction might not apply to the hexadecimal string but rather to its equivalent binary representation. This matches what Python binascii.unhexlify returns about the former value and the newer one

>>> from binascii import unhexlify
>>> len(unhexlify("1234567890ABCDEF1234567890ABCDEF12345678"))
20
>>> len(unhexlify("1234567890ABCDEF1234"))
10
>>> unhexlify("1234567890ABCDEF1234567890ABCDEF12345678")
b'\x124Vx\x90\xab\xcd\xef\x124Vx\x90\xab\xcd\xef\x124Vx'
melissalinkert commented 2 years ago

Following today's formats team discussion, removing HashSHA1 entirely from XMLMockObjects and re-including.

The longer-term solution is to update the schema to deprecate HashSHA1, and ensure that nothing else in our stack uses this feature. git grep HashSHA1 on https://github.com/ome/bioformats/tree/3d73a85c5af636cf5af42e952adae4e22611585b shows nothing.