openpreserve / odf-validator

Open source Open Document Format (ODF) validation
http://odf.openpreservation.org/
BSD 3-Clause "New" or "Revised" License
4 stars 0 forks source link

POL_1 not always reported #161

Open maria-messerschmidt opened 5 months ago

maria-messerschmidt commented 5 months ago

This is somewhat related to #160 since POL_1 should catch all encrypted files, not just the ones that aren't stored. But the examples from #160 show that this is not the case. In addition to the examples in #160, I have added a few additional ones that specifically relate to POL_1 below.

Scenario 1: Working scenario -> POL_1 POL_1 is correctly reported if a file is saved with encryption/password from e.g. LibreOffice. For example:

E002b.ods

Which produces this error log: C:\odf\odf-validator-main>odf-validator.bat -p "filer\testfiler\E002b.ods" APP-1: [INFO] Validating filer\testfiler\E002b.ods. APP-5: [INFO] DNA ODF Spreadsheets Preservation Specification Profile report for filer\testfiler\E002b.ods. POL_1: Object 1\styles.xml [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: Object 1\content.xml [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: settings.xml [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: Object 1\settings.xml [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: manifest.rdf [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: meta.xml [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: ObjectReplacements\Object 1 [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: Pictures\10000000000001AE000002009161D160.png [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: styles.xml [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: content.xml [ERROR] Encryption | The package MUST NOT contain any encrypted entries. NOT VALID, 10 errors, 0 warnings and 0 info messages.

However, this is the only scenario where this is caught.

Also, the files are not actually encrypted (although the document is password-protected, so POL_1 is valid).

For example content.xml from this file: Path = content.xml Folder = - Size = 4656 Packed Size = 4656 Modified = 2024-06-26 09:12:38 Created = Accessed = Attributes = Encrypted = - Comment = CRC = DB07EAF9 Method = Store Characteristics = Descriptor UTF8 Host OS = FAT Version = 20 Volume Index = 0 Offset = 18854

Scenario 2: encrypting a full ODS-package I tried encrypting the ODS-file itself and checked that all zip entries were encrypted.

E002c.ods

Example of attributes for content.xml

Path = content.xml Folder = - Size = 23922 Packed Size = 3990 Modified = 2024-06-26 09:28:02.0000000 Created = Accessed = Attributes = Encrypted = + Comment = CRC = 64587F76 Method = pkAES-256 Deflate Characteristics = NTFS StrongCrypto : Encrypt StrongCrypto UTF8 Host OS = FAT Version = 51 Volume Index = 0 Offset = 31176

Then ran validation which produced the following:

C:\odf\odf-validator-main>odf-validator.bat -p "filer\testfiler\E002c.ods" APP-1: [INFO] Validating filer\testfiler\E002c.ods. APP-2: [ERROR] Unsupported feature encryption used in entry settings.xml

So here, we are back to the APP-2 error (and also the various errors documented in #160 that is instead of APP-2 when validator is run without profile).

C:\odf\odf-validator-main>odf-validator.bat "filer\testfiler\E002c.ods" APP-1: [INFO] Validating filer\testfiler\E002c.ods. org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException: Unsupported feature encryption used in entry settings.xml at org.apache.commons.compress.archivers.zip.ZipUtil.checkRequestedFeatures(ZipUtil.java:147) at org.apache.commons.compress.archivers.zip.ZipFile.getInputStream(ZipFile.java:953) at org.openpreservation.format.zip.ZipFileProcessor.getEntryInputStream(ZipFileProcessor.java:116) at org.openpreservation.odf.pkg.PackageParserImpl.processEntry(PackageParserImpl.java:129) at org.openpreservation.odf.pkg.PackageParserImpl.processZipEntries(PackageParserImpl.java:109) at org.openpreservation.odf.pkg.PackageParserImpl.parsePackage(PackageParserImpl.java:100) at org.openpreservation.odf.pkg.PackageParserImpl.parsePackage(PackageParserImpl.java:70) at org.openpreservation.odf.validation.ValidatingParserImpl.parsePackage(ValidatingParserImpl.java:74) at org.openpreservation.odf.validation.Validator.validatePackage(Validator.java:107) at org.openpreservation.odf.validation.Validator.validate(Validator.java:83) at org.openpreservation.odf.apps.CliValidator.validatePath(CliValidator.java:68) at org.openpreservation.odf.apps.CliValidator.call(CliValidator.java:60) at org.openpreservation.odf.apps.CliValidator.call(CliValidator.java:35) at picocli.CommandLine.executeUserObject(CommandLine.java:2041) at picocli.CommandLine.access$1500(CommandLine.java:148) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461) at picocli.CommandLine$RunLast.handle(CommandLine.java:2453) at picocli.CommandLine$RunLast.handle(CommandLine.java:2415) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273) at picocli.CommandLine$RunLast.execute(CommandLine.java:2417) at picocli.CommandLine.execute(CommandLine.java:2170) at org.openpreservation.odf.apps.CliValidator.main(CliValidator.java:87)

Scenario 3: working POL_1 scenario without profile I thought I would try to see what happened if I tried validation on the file that was saved with password from LibreOffice. I assume this should be a valid file since the content is encrypted, but it is also stored.

E002b.ods

However, the output has a number of errors: C:\odf\odf-validator-main>odf-validator.bat "filer\testfiler\E002b.ods" APP-1: [INFO] Validating filer\testfiler\E002b.ods. APP-4: [INFO] Validation report for filer\testfiler\E002b.ods. XML-3: settings.xml [ERROR] Not a well formed XML document. XML parsing exception at line 1 and column 1: Invalid byte 2 of 4-byte UTF-8 sequence.. DOC-3: mimetype [INFO] OpenDocument MIMETYPE application/vnd.oasis.opendocument.spreadsheet detected XML-3: manifest.rdf [ERROR] Not a well formed XML document. XML parsing exception at line 1 and column 1: Invalid byte 2 of 2-byte UTF-8 sequence.. XML-3: meta.xml [ERROR] Not a well formed XML document. XML parsing exception at line 1 and column 1: Invalid byte 1 of 1-byte UTF-8 sequence.. PKG-7: Thumbnails\thumbnail.png [WARNING] An OpenDocument Package SHOULD contain a preview image Thumbnails/thumbnail.png. XML-3: content.xml [ERROR] Not a well formed XML document. XML parsing exception at line 1 and column 1: Invalid byte 1 of 1-byte UTF-8 sequence.. XML-3: styles.xml [ERROR] Not a well formed XML document. XML parsing exception at line 1 and column 1: Invalid byte 2 of 2-byte UTF-8 sequence.. NOT VALID, 5 errors, 1 warnings and 1 info messages.

PKG-7 is expected since the thumbnail isn't generated in this scenario, but I am not sure about the rest.

maria-messerschmidt commented 3 months ago

I have tested the fix for this and am getting the following logs:

C:\odf\odf-validator-main>odf-validator.bat -p "C:\Users\maria\Desktop\2024-07\AT068\AT068.ods" APP-1: [INFO] Validating C:\Users\maria\Desktop\2024-07\AT068\AT068.ods. SYS-1: [ERROR] Package could not be parsed, due to an exception. | The following zip entries could not be read: settings.xml: Unsupported Zip feature: compression method META-INF/manifest.xml: Unsupported Zip feature: compression method manifest.rdf: Unsupported Zip feature: compression method mimetype: Unsupported Zip feature: compression method

C:\odf\odf-validator-main>odf-validator.bat -p "C:\Users\maria\Desktop\2024-07\AT040\AT040tmp.ods" APP-1: [INFO] Validating C:\Users\maria\Desktop\2024-07\AT040\AT040tmp.ods. SYS-1: [ERROR] Package could not be parsed, due to an exception. | The following zip entries could not be read: settings.xml: Unsupported Zip feature: compression method manifest.rdf: Unsupported Zip feature: compression method meta.xml: Unsupported Zip feature: compression method styles.xml: Unsupported Zip feature: compression method

It would be good to 1. ensure there is still a policy error of some sort (POL_1 og POL_2) and 2. distinguish between compression and encryption if possible.

When running the validator without profile, XML-3 errors are still generated for encrypted files since these cannot be parsed.

maria-messerschmidt commented 2 months ago

We need to check and make sure how the new error for this is caught in the API as well. I will update the issue with more details once I have been able to test this, but as discussed, this likely will not currently work with the API.