openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
161 stars 78 forks source link

Different element name in UI and when exported as xml or txt - PDF module #896

Closed leninoc closed 4 months ago

leninoc commented 5 months ago

hi, I was investigating why PDFMetadata.Images.Image.NisoImageMetadata.MIMEType element in PDF module was not getting any values in Rosetta and found strange differences in xml and txt exports when compared to JHOVE UI. JHOVE UI result looks like this

image

MIMEType field with value of "image/jpg".

When exported as xml the (what i think is the same element) is labelled differently. See below:

image

Similarly in txt output:

image

I think in the xml/txt output is wrong label for MIMEType element. thank you Jan PDF attached but you can test on any PDF PDFA_in_a_Nutshell_211.pdf

samalloing commented 5 months ago

Hi Jan,

The XML/Text output is correct according to the MIX specification. The GUI should be changed to FormatName to follow the specifications

Sam

leninoc commented 5 months ago

hi Sam, aha I see now, thanks for pointing it out, I really should have consulted MIX before making any conclusions. From the Note in the formatName description in MIX: "Values should be taken from a controlled vocabulary. It is permissible to either list proper format names (e.g., “Adobe PDF”) or MIME types (e.g., “image/tiff” or “image/jp2”)" its obvious that it should have never been called MIMEType in JHOVE UI, as MIME type is only one of the possible values.

So the requested fix should be other way round - rename MIMEType field in UI to FormatName

cheers Jan