keeps / commons-ip

Commons IP is project that provide a command-line tool and Java Library to validate and manipulate E-ARK Information Packages, so to create or process E-ARK SIP and AIP and also validate them against official specifications.
http://keeps.github.io/commons-ip/
GNU Lesser General Public License v3.0
11 stars 14 forks source link

Add support for existing checksums to METS #280

Closed mrBaas closed 1 month ago

mrBaas commented 3 months ago

I was hoping by using setChecksum() on the IPFile class, that checksum generation would be skipped when building METS/ZIP. But after traversing source code that doesn't seem the case?

We're dealing with large volume of data (in the terrabyte range) that is already covered by checksums, so it seems like a waste to have to generate them all over again in the process of creating the SIP.

luis100 commented 3 months ago

During construction of the SIP we validate the file, ensuring the reported checksum corresponds to the calculated file checksum. I guess that generally you want to disable this check because you are confident the copy and ZIP of the package will work well, or you are open to accept the risk due to the time constraints.

This is not the usual use case, but it should be a easy thing to turn off by setting this variable to be configurable (from environment variable?) and default to true.

https://github.com/keeps/commons-ip/blob/f7661942646f8aacfab02aebae71d3413f5de084/src/main/java/org/roda_project/commons_ip/utils/Utils.java#L184

hmiguim commented 1 month ago

A new property was added: skipChecksumCalculation which will skip the checksum validation. More information can be found in the README.md file.