zimeon / ocfl-py

OCFL tools in Python
MIT License
20 stars 6 forks source link

Harvard 2: List files in individual object #106

Open awoods opened 1 year ago

awoods commented 1 year ago

As a part of our bulk download process, we would like to pull down individual OCFL objects from S3 to local disk, then use ocfl-py to inspect and pull out specific files.

This will involve three new functions in ocfl-py:

  1. load individual object
  2. list files in object (optional version arg, default head)
  3. get content (arg: logical path, optional version arg)

This issue is to design the CLI interaction for step 2.

zimeon commented 1 year ago

We have

ocfl-py> ./ocfl-object.py --show --objdir fixtures/1.1/good-objects/spec-ex-full
WARNING:ocfl.object:OCFL v1.1 Object at fixtures/1.1/good-objects/spec-ex-full has VALID STRUCTURE (DIGESTS NOT CHECKED)
WARNING:ocfl.object:Object tree
[fixtures/1.1/good-objects/spec-ex-full]
├── 0=ocfl_object_1.1 
├── inventory.json 
├── inventory.json.sha512 
├── v1 
│   ├── content (3 files)
│   ├── inventory.json 
│   └── inventory.json.sha512 
├── v2 
│   ├── content (1 files)
│   ├── inventory.json 
│   └── inventory.json.sha512 
└── v3 
    ├── inventory.json 
    └── inventory.json.sha512 

Which shows the object structure but seems from the help that I intended it to show files as well:

ocfl-py> ./ocfl-object.py --h
...
  --show                Show versions and files in an OCFL object (default: False)

Maybe the change should make this actually show the files and have an additional parameter to select a particular object version (note --version is taken to show program version). So maybe:

Thoughts?

awoods commented 1 year ago

Similar to my updated comment in https://github.com/zimeon/ocfl-py/issues/105, the "list files" functionality will ideally be available in-memory, i.e. I would like to create a utility that loads a standalone OCFL object into memory, then to programmatically iterate through the files of that object and cherry-pick individual files to write to a different directory (with a different filename).