rstudio / pins-r

Pin, discover, and share resources
https://pins.rstudio.com
Other
312 stars 63 forks source link

Usage of S3 tags with pins? #628

Closed cnemarich closed 2 years ago

cnemarich commented 2 years ago

Question: Is there any way to upload objects to an S3 board with tags? If this is not possible, can it be added as a feature?

Use Case: My organization currently uses the {paws} package to upload shared datasets to an S3 bucket. This data is then used by various team members. However, it is also then read and ingested into other systems automatically based on what tags the object has. We would like to move to using the {pins} package for managing these shared datasets, but would need to be able to retain this ability to write tags to objects as they're being uploaded.

Additional Notes: The put_object method from the {paws} package that {pins} relies on allows tags to be passed along with uploaded objects. I took a look at the pins_write function and, while it does take additional parameters, none of these arguments seem to be passed on to the s3_upload_file helper function: https://github.com/rstudio/pins/blob/baaa304f876ec9cc98cc90d3333db157e34028f7/R/board_s3.R#L210-L220 https://github.com/rstudio/pins/blob/baaa304f876ec9cc98cc90d3333db157e34028f7/R/board_s3.R#L270-L277

juliasilge commented 2 years ago

@cnemarich Looking at this initially, it seems like it would work best to pass this through the dots from pin_write() to pin_store() to s3_upload_file() to the paws put_object method. That would look like this:

board_sales <- board_s3("company-pins", prefix = "sales/")
board_sales %>% pin_write(mtcars, Tagging = "key1=value1&key2=value2")

@machow It looks like the Python board_s3() doesn't support tags right now either. What do you think about putting it as an argument to pin_write() like this? Is there a different way that would work better for Python?

machow commented 2 years ago

Hey, thanks for looking through this. Passing through Tagging makes sense to me.

On the python side, we'll need to figure out if we want to subclass BaseBoard, which is not s3 specific, or make load_data a generic function that dispatches on the filesystem. (But should be quick to sort out).

Here's what getting and setting tags looks like in s3fs (the fsspec implementation for s3)

# from the s3fs tests
def test_tags(s3):
    tagset = {'tag1': 'value1', 'tag2': 'value2'}
    fname = list(files)[0]
    s3.touch(fname)
    s3.put_tags(fname, tagset)
    assert s3.get_tags(fname) == tagset
juliasilge commented 2 years ago

@machow Let's plan for adding this feature after conf 👍

github-actions[bot] commented 2 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.