nlnwa / veidemann-warcvalidator

Apache License 2.0
0 stars 0 forks source link

Digitial Preservation System compliance #27

Open trym-b opened 11 months ago

trym-b commented 11 months ago

Currently, the checksum generated by veidemann-warcvalidator is of the pre-DPS format of <32-char-md5> <filepath>\n

DPS requires checksums of the following format <32-char-md5> *<filepath>\n. The files should also be called checksum_transferred.md5.

The checksum format should be changed as soon as the downstream implementations can support them. Things to bare in mind are that both rclone and veidemann-c-potet projects would need to be robustly tested with the new files being created by warcvalidator.

In addition, there is an argument that the creation of the checksums should not be done by this tool, especially if we want others to use veidemann too.

maeb commented 10 months ago

When generating a checksum file we try to mimic the output of the md5sum tool. The md5sum tool signals that a file has been read in binary mode (-b flag) by substituting the last of the two space separators with a *.

Since the code is already reading the file in binary mode the only thing we need to do to comply is changing the separator here: https://github.com/nlnwa/veidemann-warcvalidator/blob/86f4813532259a946df99d089aba8421b2c58c2e/src/main/java/no/nb/nna/veidemann/warcvalidator/service/ValidationService.java#L85