Closed awoods closed 2 weeks ago
This is the function I'm not sure about, where is the object loaded to/from?
Actually, I should probably rephrase this ticket (and the other two: https://github.com/zimeon/ocfl-py/issues/106 & https://github.com/zimeon/ocfl-py/issues/107) to remove the "CLI" design comment. I would like to load an object and interact with it by using ocfl-py as an imported library. For this ticket, I would like to load the object into memory as a Python object.
@awoods - I don't think loading an object into memory makes much sense, that could be really big! I certainly understand loading an inventory associated with an object on storage. Currently one can load the inventory and get a disc based on the parsed JSON:
>>> import ocfl
>>> object = ocfl.Object(path="fixtures/1.1/good-objects/spec-ex-full")
>>> inv = object.parse_inventory()
>>> inv
{'digestAlgorithm': 'sha512', 'fixity': {'md5': {'184f84e28cbe75e050e9c25ea7f2e939': ['v1/content/foo/bar.xml'], '2673a7b11a70bc7ff960ad8127b4adeb': ['v2/content/foo/bar.xml'], 'c289c8ccd4bab6e385f5afdd89b5bda2': ['v1/content/image.tiff'], 'd41d8cd98f00b204e9800998ecf8427e': ['v1/content/empty.txt']}, 'sha1': {'66709b068a2faead97113559db78ccd44712cbf2': ['v1/content/foo/bar.xml'], 'a6357c99ecc5752931e133227581e914968f3b9c': ['v2/content/foo/bar.xml'], 'b9c7ccc6154974288132b63c15db8d2750716b49': ['v1/content/image.tiff'], 'da39a3ee5e6b4b0d3255bfef95601890afd80709': ['v1/content/empty.txt']}}, 'head': 'v3', 'id': 'ark:/12345/bcd987', 'manifest': {'4d27c86b026ff709b02b05d126cfef7ec3aed5f83f5e98df7d7592f7a44bd1dc7f29509cff06b884158baa36a2bbeda11ab8a64b56585a70f5ce1fa96e26eb53': ['v2/content/foo/bar.xml'], '7dcc352f96c56dc5b094b2492c2866afeb12136a78f0143431ae247d02f02497bbd733e0536d34ec9703eba14c6017ea9f5738322c1d43169f8c77785947ac31': ['v1/content/foo/bar.xml'], 'cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e': ['v1/content/empty.txt'], 'ffccf6baa21809716f31563fafb9f333c09c336bb7400088f17e4ff307f98fc9b14a577f92f3285913b7f53a6d5cf004503cf839aada1c885ac69336cbfb862e': ['v1/content/image.tiff']}, 'type': 'https://ocfl.io/1.1/spec/#inventory', 'versions': {'v1': {'created': '2018-01-01T01:01:01Z', 'message': 'Initial import', 'state': {'7dcc352f96c56dc5b094b2492c2866afeb12136a78f0143431ae247d02f02497bbd733e0536d34ec9703eba14c6017ea9f5738322c1d43169f8c77785947ac31': ['foo/bar.xml'], 'cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e': ['empty.txt'], 'ffccf6baa21809716f31563fafb9f333c09c336bb7400088f17e4ff307f98fc9b14a577f92f3285913b7f53a6d5cf004503cf839aada1c885ac69336cbfb862e': ['image.tiff']}, 'user': {'address': 'mailto:alice@example.com', 'name': 'Alice'}}, 'v2': {'created': '2018-02-02T02:02:02Z', 'message': 'Fix bar.xml, remove image.tiff, add empty2.txt', 'state': {'4d27c86b026ff709b02b05d126cfef7ec3aed5f83f5e98df7d7592f7a44bd1dc7f29509cff06b884158baa36a2bbeda11ab8a64b56585a70f5ce1fa96e26eb53': ['foo/bar.xml'], 'cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e': ['empty.txt', 'empty2.txt']}, 'user': {'address': 'mailto:bob@example.com', 'name': 'Bob'}}, 'v3': {'created': '2018-03-03T03:03:03Z', 'message': 'Reinstate image.tiff, delete empty.txt', 'state': {'4d27c86b026ff709b02b05d126cfef7ec3aed5f83f5e98df7d7592f7a44bd1dc7f29509cff06b884158baa36a2bbeda11ab8a64b56585a70f5ce1fa96e26eb53': ['foo/bar.xml'], 'cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e': ['empty2.txt'], 'ffccf6baa21809716f31563fafb9f333c09c336bb7400088f17e4ff307f98fc9b14a577f92f3285913b7f53a6d5cf004503cf839aada1c885ac69336cbfb862e': ['image.tiff']}, 'user': {'address': 'mailto:cecilia@example.com', 'name': 'Cecilia'}}}}
>>> inv['digestAlgorithm']
'sha512'
>>> inv['versions']['v1']
{'created': '2018-01-01T01:01:01Z', 'message': 'Initial import', 'state': {'7dcc352f96c56dc5b094b2492c2866afeb12136a78f0143431ae247d02f02497bbd733e0536d34ec9703eba14c6017ea9f5738322c1d43169f8c77785947ac31': ['foo/bar.xml'], 'cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e': ['empty.txt'], 'ffccf6baa21809716f31563fafb9f333c09c336bb7400088f17e4ff307f98fc9b14a577f92f3285913b7f53a6d5cf004503cf839aada1c885ac69336cbfb862e': ['image.tiff']}, 'user': {'address': 'mailto:alice@example.com', 'name': 'Alice'}}
Agreed, loading the inventory into memory versus the entire object with content files makes sense. Our use cases for this involve an OCFL repository that is created and managed by a separate application and we need a Python library to help read/inspect existing OCFL objects.
For example, we have the need to ask an OCFL object (or its inventory) for the "content path" of a specific file for which we have the "logical path". We do not know/care in which version the file was created nor do we know/care if the logical file was de-duplicated.
Ideally, the Python code importing an ocfl-py module would not need to parse/understand the underlying JSON structure of OCFL inventories.
For the specific case of finding a logical path in some version of an object a search like the following would work (close to #105):
>>> import ocfl
>>> obj = ocfl.Object(path="fixtures/1.1/good-objects/spec-ex-full")
>>> inv = obj.parse_inventory()
>>> for vdir in reversed(inv.version_directories):
... if logical_path in inv.version(vdir).logical_paths:
... print("Found %s in %s with content in %s" % (logical_path, vdir, inv.version(vdir).content_path_for_logical_path(logical_path)))
... break
...
Found empty.txt in v2 with content in v1/content/empty.txt
I guess I'm open to create something like the following to search backward through versions to find the last version and content path for a specificed logical path
vdir, content_path = inv.find_logical_path("empty.txt")
if that is of particular interest
Added the method in #129, closing.
Further suggestions welcome
Thanks, @zimeon !
As a part of our bulk download process, we would like to pull down individual OCFL objects from S3 to local disk, then use ocfl-py to inspect and pull out specific files.
This will involve three new functions in ocfl-py:
This issue is to design the CLI interaction for step 1.