informatics-isi-edu / hatrac

Simple object storage for collaborations
Apache License 2.0
3 stars 1 forks source link

S3 functional update, redirect support, migration tools. #57

Closed mikedarcy closed 3 years ago

mikedarcy commented 3 years ago

This PR includes updates for the S3 backend along with the implementation of a generic redirection response that can be used for various scenarios, such as:

  1. Trickle migration from one Hatrac instance to another.
  2. Maintaining "sparse" copies of assets where some objects are managed by the instance and others are remote links to be redirected.
  3. Creating S3 presigned-urls and redirecting clients to the S3-native URL rather than proxying the transfer from S3 on behalf of the client.

Two new utilities are provided; hatrac-migrate and hatrac-utils, which are CLI tools that are meant to be invoked under the privileged hatrac system account directly on the server instance they are intended to operate against.

  1. The hatrac-migrate tool provides functions for bulk setting the aux URL metadata element for use with redirection to a specified source server, along with the ability to transfer resources from that source server to the local instance for partial or full object migration.
  2. The hatrac-utils tool is meant to be a generic tool for performing maintenance functions against the local server instance. Currently, only one function is implemented, del-jobs which can be used to both cleanup incomplete transfer jobs older than a specified number of days, and also purge any multipart uploads from the S3 backend, which may (for whatever reason) have become orphaned from job entries in the database.

Additional notes:

  1. S3 presigned-url behavior is governed by two parameters, configured at the path-to-bucket mapping level: presigned_url_size_threshold and presigned_url_expiration_secs. The presigned_url_size_threshold is the size in bytes that the object must be greater than to have a presigned URL generated for it, while the presigned_url_expiration_secs is the validity duration for the presigned url, in seconds. The presigned_url_size_threshold must be specified with an integer value greater than zero in order to trigger S3 URL presigning. The presigned_url_expiration_secs argument is optional and defaults to 300 if not specified.

  2. An additional server configuration variable has been added, read-only which when enabled will reject PUT, POST, and DELETE methods with a 405 Method Not Allowed response. This setting is intended for use on the source server in a migration, to ensure source data is not able to be mutated while the migration is in progress.

Future improvements:

  1. Concurrent transfers during migration process.
  2. More control over the migration process via a configuration file. For instance, it may be desirable to enumerate a set of objects for transfer migration up-front, and have the remainder be redirected to the original source.
  3. Migration of objects and metadata from non-Hatrac sources, e.g. "in-place" creation of Hatrac objects from an existing S3 bucket or other storage system, e.g. a Globus endpoint.