openpreserve / odf-validator

Open source Open Document Format (ODF) validation
http://odf.openpreservation.org/
BSD 3-Clause "New" or "Revised" License
3 stars 0 forks source link

Misleading error messages for incorrect versions #150

Open maria-messerschmidt opened 5 months ago

maria-messerschmidt commented 5 months ago

According to Section 4.16.14.2 of the ODF Package Specification, the value of the manifest:version attribute shall be "1.3". This is not checked by the validator.

When validating an ODF package version 1.0/1.1, the following error is reported: "XML-4: META-INF\manifest.xml [ERROR] Not a valid XML document. Validation exception at line 2 and column 88: element "manifest:manifest" missing required attribute "manifest:version"." (along with other XML-4 errors)

When validating an ODF package version 1.2, no error is reported, and the package validates without issues.

According to the OPF specification, the following check should be done: DOC-2 (Info) OpenDocument version detected. Reports the OpenDocument version of the document.

This is only reported occassionally, and I am not sure what triggers it to be reported since it is absent from the log in most cases I have tested. It is only an INFO level message though, so not sure it would make a difference even if it did log the version correctly every time.

But the files should be compliant with version 1.3, and the error message should indicate that it is the version that is the problem.

carlwilson commented 4 months ago

Hi @maria-messerschmidt at the moment version coverage is restricted to v1.3 officially although the code is ready to support further validation. The error reporting is simply a forwarding of the the message returned by the XML validation library.

Might it be possible to get a few examples so I can take a look at this behaviour and put some improvements/fixes in place?

maria-messerschmidt commented 4 months ago

Hi @carlwilson,

I have included examples below. The issue relates to files which are not ODS 1.3. I would expect files that are version 1.0/1.1 or 1.2 to generate an relevant error message and report the version. This is not the case. While 1.0/1.1 fails validation, 1.2 passes validation with no errors. And the error generated for 1.0/1.1 does not give any information about the version.

Valid file: ODS version 1.3 (T002.ods) T002.ods Output from log: image Issue description: This validates correctly, and the only issue is the non-reporting of version which is tracked in issue #156.

Invalid file: ODS version 1.2 (T105.ods) T105.ods Output from log: image Issue description: This validates with no error messages, but I would expect an error message since this is version 1.2, not 1.3.

Invalid file: ODS version 1.0/1.1 (T104.ods) T104.ods Output from log: image Issue description: This generates XML-4 and POL-2 error messages. It should generate an error, but XML-4 is not very meaningful here. It is not reported anywhere that the actual error is that the file version is not 1.3. The manifest in version 1.0/1.1 did not contain the attribute "manifest:version" (probably since there was just the one version), so having a manifest without this attribute is a good indicator that the version is 1.0/1.1. I am not sure whether POL_2 is thrown because of the XML-errors, or whether that is from a separate check.

I guess an issue is that versions (and extended versions as reported in #151 ) are not indicated in a simple way in the ODS files. For 1.2 and 1.3, version is indicated in the manifest, but this is not the case for 1.0/1.1.

I hope the explanation and examples are useful. If you need anything else, pls let me know :)

maria-messerschmidt commented 2 months ago

Version can vary across the different files, but the value for office:version in the header of content.xml seems to always be populated, even for version 1.

Ideally, we would have a single error message for this and #151 saying something like (with the relevant selection based on validation of the file):

"The file must be an ODF package v1.3. Version detected: {1.0/1.1 | 1.2 | 1.2 Extended package | 1.3 Extended package}."

There are generally no XML errors for v1.2, and as specified in #151 the XML-4 errors or extended packages should not be reported separately. There are two XML-4 errors for v1.0/1.1 which should also not be reported separately if there is a consolidated error message for incorrect version/format.

XML-4: META-INF\manifest.xml [ERROR] Not a valid XML document. Validation exception at line 2 and column 88: element "manifest:manifest" missing required attribute "manifest:version". XML-4: content.xml [ERROR] Not a valid XML document. Validation exception at line 2 and column 3095: attribute "table:use-wildcards" not allowed here; expected attribute "table:null-year", "table:precision-as-shown" or "table:search-criteria-must-apply-to-whole-cell".

maria-messerschmidt commented 2 months ago

1.2 now works and generates a POL_2 error message indicating the version.

1.0/1.1 still has some issues and produces the following:

C:\odf\odf-validator-main>odf-validator.bat -p "C:\Users\maria\Desktop\2024-07\AT033\AT033.ods" APP-1: [INFO] Validating C:\Users\maria\Desktop\2024-07\AT033\AT033.ods. APP-5: [INFO] DNA ODF Spreadsheets Preservation Specification Profile report for C:\Users\maria\Desktop\2024-07\AT033\AT033.ods. DOC-2: [INFO] package OpenDocument version 1.3 detected. DOC-3: [INFO] mimetype OpenDocument MIMETYPE application/vnd.oasis.opendocument.spreadsheet detected XML-4: [ERROR] META-INF\manifest.xml Not a valid XML document. Validation exception at line 2 and column 171: element "manifest:manifest" missing required attribute "manifest:version". POL_2: [ERROR] AT033.ods Standard Compliance | Package does not comply with specification. The file MUST comply with the standard "OASIS Open Document Format for Office Applications (OpenDocument) v1.3". NOT VALID, 2 errors, 0 warnings and 2 info messages.