tlambert03 / ome-types

native Python dataclasses for the OME data model
https://ome-types.readthedocs.io/en/latest/
MIT License
51 stars 9 forks source link

Validation errors with Slidebook ome-tiff export #170

Closed jni closed 1 year ago

jni commented 1 year ago

I'm working with a dataset that was originally in slidebook format but was exported as an ome-tiff series. This gave the following alleged OME XML:

https://gist.github.com/jni/c4b09934715246c158397b24db7fbb3b

I tried to parse it with:

import ome_types

ome = ome_types.from_xml('ome-meta.xml', parser='lxml', validate=False)

which gives the error:

ValidationError: 624 validation errors for OME

(Full traceback at: https://gist.github.com/jni/e87f511c892475de72c880b83617e10d)

I fully expect that Slidebook is producing garbage, but I'm wondering if it's easily fixed garbage. At any rate I'm presently only after the pixel physical size, and potentially channel display colors and contrast limits, so any suggestions for grabbing that reliably from a junk xml will be appreciated. 😃

tlambert03 commented 1 year ago

yeah, this is a little bit of a tough one. it's definitely garbage, for example they use the literal string "Not Set" for Channel/@AcquisitionMode instead of omitting or or using "Other". They use "Alternate Source" for IlluminationType instead of "Other". They're just not observing the schema, so there's not much we can do here short of implementing a mode that discards all unvalidated data, which I'm not particularly interested in doing at the moment just because someone is including random strings :joy:

But if you want to clean it up manually, this seems to work for me (on main, so let me know if it doesn't on the latest release):

from ome_types import OME, to_dict

def clean_stuff(ome_dict: dict):
    for image in ome_dict['images']:
        for channel in image['pixels']['channels']:
            # or try to do something more clever
            channel.pop('acquisition_mode', None)
            channel.pop('illumination_type', None)
    return ome_dict

my_dict = to_dict('ome-meta.xml', validate=False)
ome = OME(**clean_stuff(my_dict))