Open maria-messerschmidt opened 5 months ago
I have tested the fix for this and am getting the following logs:
C:\odf\odf-validator-main>odf-validator.bat -p "C:\Users\maria\Desktop\2024-07\AT068\AT068.ods" APP-1: [INFO] Validating C:\Users\maria\Desktop\2024-07\AT068\AT068.ods. SYS-1: [ERROR] Package could not be parsed, due to an exception. | The following zip entries could not be read: settings.xml: Unsupported Zip feature: compression method META-INF/manifest.xml: Unsupported Zip feature: compression method manifest.rdf: Unsupported Zip feature: compression method mimetype: Unsupported Zip feature: compression method
C:\odf\odf-validator-main>odf-validator.bat -p "C:\Users\maria\Desktop\2024-07\AT040\AT040tmp.ods" APP-1: [INFO] Validating C:\Users\maria\Desktop\2024-07\AT040\AT040tmp.ods. SYS-1: [ERROR] Package could not be parsed, due to an exception. | The following zip entries could not be read: settings.xml: Unsupported Zip feature: compression method manifest.rdf: Unsupported Zip feature: compression method meta.xml: Unsupported Zip feature: compression method styles.xml: Unsupported Zip feature: compression method
It would be good to 1. ensure there is still a policy error of some sort (POL_1 og POL_2) and 2. distinguish between compression and encryption if possible.
When running the validator without profile, XML-3 errors are still generated for encrypted files since these cannot be parsed.
We need to check and make sure how the new error for this is caught in the API as well. I will update the issue with more details once I have been able to test this, but as discussed, this likely will not currently work with the API.
This is somewhat related to #160 since POL_1 should catch all encrypted files, not just the ones that aren't stored. But the examples from #160 show that this is not the case. In addition to the examples in #160, I have added a few additional ones that specifically relate to POL_1 below.
Scenario 1: Working scenario -> POL_1 POL_1 is correctly reported if a file is saved with encryption/password from e.g. LibreOffice. For example:
E002b.ods
Which produces this error log: C:\odf\odf-validator-main>odf-validator.bat -p "filer\testfiler\E002b.ods" APP-1: [INFO] Validating filer\testfiler\E002b.ods. APP-5: [INFO] DNA ODF Spreadsheets Preservation Specification Profile report for filer\testfiler\E002b.ods. POL_1: Object 1\styles.xml [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: Object 1\content.xml [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: settings.xml [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: Object 1\settings.xml [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: manifest.rdf [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: meta.xml [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: ObjectReplacements\Object 1 [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: Pictures\10000000000001AE000002009161D160.png [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: styles.xml [ERROR] Encryption | The package MUST NOT contain any encrypted entries. POL_1: content.xml [ERROR] Encryption | The package MUST NOT contain any encrypted entries. NOT VALID, 10 errors, 0 warnings and 0 info messages.
However, this is the only scenario where this is caught.
Also, the files are not actually encrypted (although the document is password-protected, so POL_1 is valid).
For example content.xml from this file: Path = content.xml Folder = - Size = 4656 Packed Size = 4656 Modified = 2024-06-26 09:12:38 Created = Accessed = Attributes = Encrypted = - Comment = CRC = DB07EAF9 Method = Store Characteristics = Descriptor UTF8 Host OS = FAT Version = 20 Volume Index = 0 Offset = 18854
Scenario 2: encrypting a full ODS-package I tried encrypting the ODS-file itself and checked that all zip entries were encrypted.
E002c.ods
Example of attributes for content.xml
Path = content.xml Folder = - Size = 23922 Packed Size = 3990 Modified = 2024-06-26 09:28:02.0000000 Created = Accessed = Attributes = Encrypted = + Comment = CRC = 64587F76 Method = pkAES-256 Deflate Characteristics = NTFS StrongCrypto : Encrypt StrongCrypto UTF8 Host OS = FAT Version = 51 Volume Index = 0 Offset = 31176
Then ran validation which produced the following:
C:\odf\odf-validator-main>odf-validator.bat -p "filer\testfiler\E002c.ods" APP-1: [INFO] Validating filer\testfiler\E002c.ods. APP-2: [ERROR] Unsupported feature encryption used in entry settings.xml
So here, we are back to the APP-2 error (and also the various errors documented in #160 that is instead of APP-2 when validator is run without profile).
C:\odf\odf-validator-main>odf-validator.bat "filer\testfiler\E002c.ods" APP-1: [INFO] Validating filer\testfiler\E002c.ods. org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException: Unsupported feature encryption used in entry settings.xml at org.apache.commons.compress.archivers.zip.ZipUtil.checkRequestedFeatures(ZipUtil.java:147) at org.apache.commons.compress.archivers.zip.ZipFile.getInputStream(ZipFile.java:953) at org.openpreservation.format.zip.ZipFileProcessor.getEntryInputStream(ZipFileProcessor.java:116) at org.openpreservation.odf.pkg.PackageParserImpl.processEntry(PackageParserImpl.java:129) at org.openpreservation.odf.pkg.PackageParserImpl.processZipEntries(PackageParserImpl.java:109) at org.openpreservation.odf.pkg.PackageParserImpl.parsePackage(PackageParserImpl.java:100) at org.openpreservation.odf.pkg.PackageParserImpl.parsePackage(PackageParserImpl.java:70) at org.openpreservation.odf.validation.ValidatingParserImpl.parsePackage(ValidatingParserImpl.java:74) at org.openpreservation.odf.validation.Validator.validatePackage(Validator.java:107) at org.openpreservation.odf.validation.Validator.validate(Validator.java:83) at org.openpreservation.odf.apps.CliValidator.validatePath(CliValidator.java:68) at org.openpreservation.odf.apps.CliValidator.call(CliValidator.java:60) at org.openpreservation.odf.apps.CliValidator.call(CliValidator.java:35) at picocli.CommandLine.executeUserObject(CommandLine.java:2041) at picocli.CommandLine.access$1500(CommandLine.java:148) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461) at picocli.CommandLine$RunLast.handle(CommandLine.java:2453) at picocli.CommandLine$RunLast.handle(CommandLine.java:2415) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273) at picocli.CommandLine$RunLast.execute(CommandLine.java:2417) at picocli.CommandLine.execute(CommandLine.java:2170) at org.openpreservation.odf.apps.CliValidator.main(CliValidator.java:87)
Scenario 3: working POL_1 scenario without profile I thought I would try to see what happened if I tried validation on the file that was saved with password from LibreOffice. I assume this should be a valid file since the content is encrypted, but it is also stored.
E002b.ods
However, the output has a number of errors: C:\odf\odf-validator-main>odf-validator.bat "filer\testfiler\E002b.ods" APP-1: [INFO] Validating filer\testfiler\E002b.ods. APP-4: [INFO] Validation report for filer\testfiler\E002b.ods. XML-3: settings.xml [ERROR] Not a well formed XML document. XML parsing exception at line 1 and column 1: Invalid byte 2 of 4-byte UTF-8 sequence.. DOC-3: mimetype [INFO] OpenDocument MIMETYPE application/vnd.oasis.opendocument.spreadsheet detected XML-3: manifest.rdf [ERROR] Not a well formed XML document. XML parsing exception at line 1 and column 1: Invalid byte 2 of 2-byte UTF-8 sequence.. XML-3: meta.xml [ERROR] Not a well formed XML document. XML parsing exception at line 1 and column 1: Invalid byte 1 of 1-byte UTF-8 sequence.. PKG-7: Thumbnails\thumbnail.png [WARNING] An OpenDocument Package SHOULD contain a preview image Thumbnails/thumbnail.png. XML-3: content.xml [ERROR] Not a well formed XML document. XML parsing exception at line 1 and column 1: Invalid byte 1 of 1-byte UTF-8 sequence.. XML-3: styles.xml [ERROR] Not a well formed XML document. XML parsing exception at line 1 and column 1: Invalid byte 2 of 2-byte UTF-8 sequence.. NOT VALID, 5 errors, 1 warnings and 1 info messages.
PKG-7 is expected since the thumbnail isn't generated in this scenario, but I am not sure about the rest.