Closed lwrubel closed 2 years ago
Structural metadata can be created at this step, which will allow the content-metadata step to be removed from the wasCrawlPreassembly workflow.
The current structural metadata for crawls looks like the snippet below from this example:
"contains": ⊖[ ⊖{ "type": "https://cocina.sul.stanford.edu/models/resources/file", "externalIdentifier": "bb929zb5539_1", "label": "", "version": 1, "structural": ⊖{ "contains": ⊖[ ⊖{ "type": "https://cocina.sul.stanford.edu/models/file", "externalIdentifier": "https://cocina.sul.stanford.edu/file/d5d8285b-74f3-462e-8a97-6b268ed73363", "label": "ARCHIVEIT-8751-WEEKLY-JOB1584447-SEED1426017-20220402070118605-00000-h3.warc.gz", "filename": "ARCHIVEIT-8751-WEEKLY-JOB1584447-SEED1426017-20220402070118605-00000-h3.warc.gz", "size": 3024482, "version": 1, "hasMimeType": "application/warc", "hasMessageDigests": ⊖[ ⊖{ "type": "sha1", "digest": "7cdbd7bd50248bb627929c3dd103ad9e51d2d3a0" }, ⊖{ "type": "md5", "digest": "de89fb13e94dd21b96ddc25d6103c3df" } ], "access": ⊖{ "view": "dark", "download": "none", "controlledDigitalLending": false }, "administrative": ⊖{ "publish": false, "sdrPreserve": true, "shelve": true } } ] } } ...
Checksums will need to be generated for the WARCs being registered.
Structural metadata can be created at this step, which will allow the content-metadata step to be removed from the wasCrawlPreassembly workflow.
The current structural metadata for crawls looks like the snippet below from this example:
Checksums will need to be generated for the WARCs being registered.