zimeon / ocfl-py

OCFL tools in Python
MIT License
20 stars 7 forks source link

Harvard 3: Get content/files from individual object #107

Closed awoods closed 2 weeks ago

awoods commented 1 year ago

As a part of our bulk download process, we would like to pull down individual OCFL objects from S3 to local disk, then use ocfl-py to inspect and pull out specific files.

This will involve three new functions in ocfl-py:

  1. load individual object
  2. list files in object (optional version arg, default head)
  3. get content (arg: logical path, optional version arg)

This issue is to design the CLI interaction for step 3.

zimeon commented 1 year ago

Current extract functionality (broken for v1.1 see https://github.com/zimeon/ocfl-py/issues/110) supports only extraction of all content:

ocfl-py> ./ocfl-object.py --obj fixtures/1.0/good-objects/spec-ex-full --extract v2 --dstdir /tmp/aaa
INFO:ocfl.object:Extracted v2 into /tmp/aaa
Extracted content for v2 in /tmp/aaa
ocfl-py> tree /tmp/aaa
/tmp/aaa
├── empty.txt
├── empty2.txt
└── foo
    └── bar.xml

1 directory, 3 files

I assume that in absence of version argument it should extract the latest version?

Somehow we need at least one more parameter to specify the logical path to extract

zimeon commented 2 weeks ago

https://github.com/zimeon/ocfl-py/blob/main/docs/demo_ocfl_object_script.md#54-extract-foobarxml-of-v3-into-a-new-directory:

5.4 Extract foo/bar.xml of v3 into a new directory

> python ocfl-object.py extract --objver v3 --objdir fixtures/1.1/good-objects/spec-ex-full --logical-path foo/bar.xml --dstdir tmp/files -v
Extracted foo/bar.xml in v3 to tmp/files

and the extracted file is:

> find tmp/files -print
tmp/files
tmp/files/bar.xml