jic-dtool / dtool-s3

S3 backend for dtool
MIT License
1 stars 3 forks source link

Align admin_metadata types with behvior of dtoolcore.storagebroker.BaseStorageBroker #13

Closed jotelha closed 3 years ago

jotelha commented 3 years ago

In the default behavior of BaseStorageBroker, metadata values are evaluated as JSON when read via get_admin_metadata. This returns the timestamp fields created_at and frozen_at as floats. The s3 storage broker https://github.com/jic-dtool/dtool-s3/blob/b0b98c9200865ba263d3a1db55b78cc844717935/dtool_s3/storagebroker.py#L483-L491 deviates from this behavior and returns the metadata object as retrieved via boto3. This apparently may yield the timestamp fields as strings, see below comparison between s3 and other storage broker.

>>> import dtoolcore
>>> from dtool_cli.cli import (
...     CONFIG_PATH,
... )
>>> smb_admin_metadata = dtoolcore._admin_metadata_from_uri('smb://jh1130/001b8a13-3db5-4e0a-bb19-d8e6a8a44ad6', CONFIG_PATH)
>>> smb_admin_metadata
{'uuid': '001b8a13-3db5-4e0a-bb19-d8e6a8a44ad6', 'dtoolcore_version': '3.17.0', 'name': '2020-09-14-19-40-26-128617-n-844-m-844-s-monolayer-substratepassivation', 'type': 'dataset', 'creator_username': 'hoermann4', 'created_at': 1600120598.214282, 'frozen_at': 1600831276.888873}
>>> s3_admin_metadata = dtoolcore._admin_metadata_from_uri('s3://frct-simdata/001b8a13-3db5-4e0a-bb19-d8e6a8a44ad6', CONFIG_PATH)
>>> s3_admin_metadata
{'created_at': '1600120598.214282', 'creator_username': 'hoermann4', 'dtoolcore_version': '3.17.0', 'frozen_at': '1600831276.888873', 'name': '2020-09-14-19-40-26-128617-n-844-m-844-s-monolayer-substratepassivation', 'type': 'dataset', 'uuid': '001b8a13-3db5-4e0a-bb19-d8e6a8a44ad6'}
>>> type(smb_admin_metadata['frozen_at'])
<class 'float'>
>>> type(s3_admin_metadata['frozen_at'])
<class 'str'>

This PR aligns the behavior to the dtoolcore reference implementation.

tjelvar-olsson commented 3 years ago

I believe the dtool "truth" about these values come from the dtoolcore.utils.timestamp utilility function: https://github.com/jic-dtool/dtoolcore/blob/e976c426bef01cc9abdc985e36eae3695d8ee384/dtoolcore/utils.py#L244-L251

I've seen these "symptoms" before. Mainly when working with the dtool-lookup-server. Hence lines like:

I always thought that I saw these things because of "old" datasets where the API had not stabilised on using floats to represent these timestamps. However, since you don't have any datasets from early versions of dtool it suggests that there is something else is at play.

It looks like this is the issue:

https://github.com/jic-dtool/dtool-s3/blob/b0b98c9200865ba263d3a1db55b78cc844717935/dtool_s3/storagebroker.py#L476

The code in the above looks very intentional and now I'm starting to remember that this is because the admin metadata is stored as native S3 metadata, rather than as a file. That is something that comes from the original implementation of this storagebroker:

https://github.com/jic-dtool/dtool-s3/blob/89ff0669675537891ba603e93ba3d93a9332ace1/dtool_s3/storagebroker.py#L48

@mrmh2 can you confirm if my memory of the above is correct?

@jotelha I think the solution you have implemented is neat. However, for performance reasons I would prefer if we removed the for loop and used two explicit calls to force the conversion of the created_at and frozen_at values to floats. What do you think?

For backwards compatibility I also think that we need to have some logic to deal with datasets that are missing the created_at attribute. This was "only" introduced in dtool version 3.2 back in May 2018 https://github.com/jic-dtool/dtoolcore/blob/master/CHANGELOG.rst#320---2018-05-18

jotelha commented 3 years ago

Is the last commit to your taste, @tjelvar-olsson ? ;)

tjelvar-olsson commented 3 years ago

Have not had the chance to look yet. I'll try and find the time tonight.

tjelvar-olsson commented 3 years ago

@jotelha thank you! :)

When I ran the tests I found that we also needed conditional logic to check if frozen_at key is present in the admin_metadata. This is needed for the cases when one wants to retrieve the admin_metadata from a proto dataset.

Fixed in 7594285