keeps / commons-ip

Commons IP is project that provide a command-line tool and Java Library to validate and manipulate E-ARK Information Packages, so to create or process E-ARK SIP and AIP and also validate them against official specifications.
http://keeps.github.io/commons-ip/
GNU Lesser General Public License v3.0
11 stars 14 forks source link

Issue with Incorrect Checksum for representation METS in SIP Creation #250

Closed JohannesKarlsen99 closed 6 months ago

JohannesKarlsen99 commented 7 months ago

When creating a Submission Information Package (SIP) using the commons-ip library, I encountered an unexpected behavior regarding cheksums. Despite explicitly setting the checksum to be MD5 using the setChecksum method from the IP.java class, the resulting SIP contains a SHA-256 checksum for the representation METS.xml. I'm uncertain whether this behavior is intentional or if it indicates a potential bug within the commons-ip library.

Example fileSec:

<fileSec ID="uuid-C88FF056-B090-4244-BD3D-1CD98734D26D">
        <fileGrp ID="uuid-1B616075-4F38-42C0-AB06-7D50B3707320" USE="Schemas">
            <file ID="ID-D16EFBD6-65FF-4172-BBC9-CB0ABC7600AA" MIMETYPE="application/octet-stream" SIZE="2038" CREATED="2024-02-05T09:46:30.234+01:00" CHECKSUM="EB72EF8AB5B1C93801DFACBFE6AA8E27" CHECKSUMTYPE="MD5">
                <FLocat xlink:type="simple" xlink:href="schemas/DILCISExtensionMETS.xsd" LOCTYPE="URL"/>
            </file>
            <file ID="ID-23206CA9-C9B1-4A08-86B5-1C76CC0D1AF7" MIMETYPE="application/octet-stream" SIZE="499" CREATED="2024-02-05T09:46:30.241+01:00" CHECKSUM="83DA1FF6F35ADEECE3CCCFB5E2E9F83A" CHECKSUMTYPE="MD5">
                <FLocat xlink:type="simple" xlink:href="schemas/DILCISExtensionSIPMETS.xsd" LOCTYPE="URL"/>
            </file>
            <file ID="ID-86C50098-24A4-48FB-BFFF-E331EFB61DE8" MIMETYPE="application/octet-stream" SIZE="137125" CREATED="2024-02-05T09:46:30.247+01:00" CHECKSUM="0504DEDC1251E87D7E85F9FF2DBADC0D" CHECKSUMTYPE="MD5">
                <FLocat xlink:type="simple" xlink:href="schemas/mets1_12.xsd" LOCTYPE="URL"/>
            </file>
            <file ID="ID-99663FA8-1243-4EE4-BD3D-A058B5E4500A" MIMETYPE="application/octet-stream" SIZE="3180" CREATED="2024-02-05T09:46:30.252+01:00" CHECKSUM="6BDC7F9459A502964F889D70A335CECE" CHECKSUMTYPE="MD5">
                <FLocat xlink:type="simple" xlink:href="schemas/xlink.xsd" LOCTYPE="URL"/>
            </file>
        </fileGrp>
        <fileGrp ID="uuid-F11C5D3F-FF82-44D2-992D-D799C16F8803" USE="Representations/originals-001">
            <file ID="ID-94569F65-F870-4293-B668-B2155A262AA6" MIMETYPE="application/xml" SIZE="1199" CREATED="2024-02-05T09:47:30.792+01:00" CHECKSUM="0EF2DA26742DFD642192896A7FDC92C0267D23964848F25C26F0261035860550" CHECKSUMTYPE="SHA-256">
                <FLocat xlink:type="simple" xlink:href="representations/originals-001/METS.xml" LOCTYPE="URL"/>
            </file>
        </fileGrp>
    </fileSec>
luis100 commented 7 months ago

There seems to be a few instances where the CHECKSUM_ALGORITHM (which defaults to SHA256) constant is used instead of the configured parameter. They should get it from the SIP instance.

ThomasEdvardsen commented 6 months ago

We need this functionality in our business, and have therefore started making changes for internal use. Perhaps you can benefit from the changes we have made in our fork? We are happy to contribute with a pull request, but we do not have full control over the entire code base yet. This is a patch of the 2.5.0 version.

https://github.com/keeps/commons-ip/compare/2.5.0...NationalLibraryOfNorway:commons-ip:2.5.0-checksum-patch?expand=1