ropensci / gittargets

Data version control for reproducible analysis pipelines in R with {targets}.
https://docs.ropensci.org/gittargets/
Other
87 stars 1 forks source link

Leverage existing AWS S3 targets #6

Closed wlandau closed 2 years ago

wlandau commented 2 years ago

Prework

Proposal

If this idea works, it will supersede #5 and #2.

{targets} can already upload data to AWS S3. As explained in https://books.ropensci.org/targets/cloud.html, a target's data is uploaded, downloaded, and tracked via an S3 bucket while the pipeline is running. If versioning is enabled in the S3 bucket, then we could revert the S3 targets back to their original versions.

I propose enhancements to tar_git_snapshot() and tar_git_checkout() that should not require any changes to core targets.

tar_git_snapshot()

tar_git_snapshot() could call aws.s3::get_versions() to get all the version IDs of all the objects in all the buckets in the current tar_meta() metadata. This could be written to a file in _targets/gittargets/ and snapshotted with the rest of the local data.

We would need to first check that the user is actually using S3 and versioning is supported in the buckets (via aws.s3::get_versioning()).

tar_git_checkout()

tar_git_checkout() could invoke aws.s3::copy_object() to promote the historical version of a target to the current version pulled with tar_read(). We somehow need to supply the version ID to the REST headers in aws.s3::copy_object(). Should theoretically be possible, but I have not tested it.

Proof of concept

We should test all these steps with a simple example in aws.s3 first:

  1. Create a versioned bucket.
  2. Upload a couple versions while keeping track of their version IDs.
  3. Revert to the old version and test that the old version was successfully reverted.
  4. Bring back the new version and test the same way.
wlandau commented 2 years ago

Doesn't work yet, but it should be possible. Asked at https://stackoverflow.com/questions/70071240/revert-to-an-old-version-of-an-object-in-a-versioned-bucket.

wlandau commented 2 years ago

Closing in favor of https://github.com/ropensci/targets/issues/711.

wlandau commented 2 years ago

Reopening. Relative to native AWS versioning in targets, an AWS gittargets backend would allow less frequent uploads and allow users to opt in later in the project’s life cycle.

wlandau commented 2 years ago

Oops, reopened wrong issue