Closed jni closed 1 year ago
yeah, this is a little bit of a tough one. it's definitely garbage, for example they use the literal string "Not Set" for Channel/@AcquisitionMode instead of omitting or or using "Other". They use "Alternate Source" for IlluminationType instead of "Other". They're just not observing the schema, so there's not much we can do here short of implementing a mode that discards all unvalidated data, which I'm not particularly interested in doing at the moment just because someone is including random strings :joy:
But if you want to clean it up manually, this seems to work for me (on main, so let me know if it doesn't on the latest release):
from ome_types import OME, to_dict
def clean_stuff(ome_dict: dict):
for image in ome_dict['images']:
for channel in image['pixels']['channels']:
# or try to do something more clever
channel.pop('acquisition_mode', None)
channel.pop('illumination_type', None)
return ome_dict
my_dict = to_dict('ome-meta.xml', validate=False)
ome = OME(**clean_stuff(my_dict))
I'm working with a dataset that was originally in slidebook format but was exported as an ome-tiff series. This gave the following alleged OME XML:
https://gist.github.com/jni/c4b09934715246c158397b24db7fbb3b
I tried to parse it with:
which gives the error:
(Full traceback at: https://gist.github.com/jni/e87f511c892475de72c880b83617e10d)
I fully expect that Slidebook is producing garbage, but I'm wondering if it's easily fixed garbage. At any rate I'm presently only after the pixel physical size, and potentially channel display colors and contrast limits, so any suggestions for grabbing that reliably from a junk xml will be appreciated. 😃