Closed johnguirgis closed 3 years ago
There are two files in the source directory that are identical:
5> diff -s data/5/4/9/OBJ.jpg data/6/5/10/OBJ.jpg
Files data/5/4/9/OBJ.jpg and data/6/5/10/OBJ.jpg are identical
simeon@RottenApple 5> shasum -a 512 data/5/4/9/OBJ.jpg data/6/5/10/OBJ.jpg
07c9212cf01a37532499a721d515c418f1128e4cc1aff3c195030e7b5ebf7d5e667482277a6b27cf6ce3480225796694f8ca89c6616e0de2718be617fe1e106a data/5/4/9/OBJ.jpg
07c9212cf01a37532499a721d515c418f1128e4cc1aff3c195030e7b5ebf7d5e667482277a6b27cf6ce3480225796694f8ca89c6616e0de2718be617fe1e106a data/6/5/10/OBJ.jpg
thus in the OCFL object it is usual that only one copy is stored. The inventory lists both logical files in the state for the one digest:
...
"versions": {
"v1": {
"created": "2021-08-12T13:05:03.056950Z",
"message": "hello",
"state": {
"07c9212cf01a37532499a721d515c418f1128e4cc1aff3c195030e7b5ebf7d5e667482277a6b27cf6ce3480225796694f8ca89c6616e0de2718be617fe1e106a": [
"data/5/4/9/OBJ.jpg",
"data/6/5/10/OBJ.jpg"
],
...
The ocfl-object.py
code has a flag --no-dedupe
to avoid this deduping. I'm not sure why one would want to use it in production but as a test it does show that and object with duplicate content can be created:
> ocfl-object.py --create --no-dedupe --srcdir 5 --id 5:5 --objdir ./ocfl_obj --name name --message hello --address a@company.com
INFO:ocfl.object:Created OCFL object 5:5 in ./ocfl_obj
> ls -l ocfl_obj/v1/content/data/5/4/9/OBJ.jpg ocfl_obj/v1/content/data/6/5/10/OBJ.jpg
-rw-r--r-- 1 simeon staff 70727 Aug 12 10:41 ocfl_obj/v1/content/data/5/4/9/OBJ.jpg
-rw-r--r-- 1 simeon staff 70727 Aug 12 10:41 ocfl_obj/v1/content/data/6/5/10/OBJ.jpg
> more ocfl_obj/inventory.json
{
"digestAlgorithm": "sha512",
"head": "v1",
"id": "5:5",
"manifest": {
"07c9212cf01a37532499a721d515c418f1128e4cc1aff3c195030e7b5ebf7d5e667482277a6b27cf6ce3480225796694f8ca89c6616e0de2718be617fe1e106a": [
"v1/content/data/5/4/9/OBJ.jpg",
"v1/content/data/6/5/10/OBJ.jpg"
],
...
Note also that the command you used it not interpreting the source as a Bagit bag but just as a directory structure. To interpret as a bag use --srcbag
instead of --srcdir
, e.g.:
> ocfl-object.py --create --srcbag 5 --id 5:5 --objdir ./ocfl_obj --name name --message hello --address a@company.coma
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/5/node.jsonld
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/5/node_en.json
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/6/node.jsonld
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/6/node_en.json
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/7/node.jsonld
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/7/node_en.json
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/5/4/media.jsonld
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/5/4/media_en.json
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/6/5/media.jsonld
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/6/5/media_en.json
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/7/6/media.jsonld
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/7/6/media_en.json
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/5/4/9/OBJ.jpg
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/5/4/9/file.json
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/5/4/9/file.jsonld
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/6/5/10/OBJ.jpg
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/6/5/10/file.json
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/6/5/10/file.jsonld
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/7/6/11/OBJ.jpg
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/7/6/11/file.json
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/data/7/6/11/file.jsonld
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/bag-info.txt
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/bagit.txt
INFO:bagit:Verifying checksum for file /Users/simeon/Downloads/dd/5/manifest-sha1.txt
INFO:ocfl.object:Created OCFL object 5:5 in ./ocfl_obj
I am trying to create an OCFL object from a bag that was generated from some online content. I used the following command,
which generated this OCFL object. The issue is that one of the files from the bag (
5/data/6/5/10/OBJ.jpg
) is not being added into the corresponding directory (ocfl_obj/v1/content/data/6/5/10
) in the OCFL object. Any guidance on why this might be or how to resolve?