rstudio / pins-r

Pin, discover, and share resources
https://pins.rstudio.com
Other
312 stars 63 forks source link

Add new `write_board_manifest()` function #661

Closed juliasilge closed 1 year ago

juliasilge commented 1 year ago

Related to #650

That PR is a bit sprawling so I think this will be better if we can break it up into some pieces. This PR implements part of that change, adding the new function board_manifest() and the internal generic to call for various boards.

I have not implemented this for board_rsconnect(), because it's going to be pretty complicated. (Maybe look up all pins that the specific user can access?)

I changed the function name since this operates on a board and returns the board (it doesn't act on a pin).

juliasilge commented 1 year ago

Currently one failure because we can't build the brand-new version of arrow on Windows.

@ijlyttle do you mind taking a look at this PR and seeing if you have feedback on the changes I made relative to your original work?

ijlyttle commented 1 year ago

I think this looks great; the naming makes perfect sense!

Sorry I let this get away from me a bit. I agree that reducing the scope of each bit will make it much easier to manage.

Thanks!

juliasilge commented 1 year ago

I think I got all the tests and additional documentation added here.

I don't think I'll open an issue about doing this for Connect until someone asks for it, as it is going to be a bear (just leave it non-implemented so default method gets called). As a reminder, this is the first ~half of what #650 aims for, and there will still be more to do for the overall workflow to be possible.

ijlyttle commented 1 year ago

Hi @juliasilge - just a quick note, I have updated my workflows to use this branch, everything is working great with _pins.yaml and write_board_manifest().

This should not be in the least bit surprising, but figured you might like to know.

Thanks!

machow commented 1 year ago

Sorry for the wait -- I'm going to double check quickly against pins python boards, but expect everything should work fine!

machow commented 1 year ago

@juliasilge it appears adding a pins.txt file as a manifest breaks pin_search on python pins. The reason is that in the python version pin_list() returns the filename, and pin_search attempts to retrieve metadata for every pin as part of search.

Here's the relevant R pins pins_search code fetching metadata (which works fine with the manifest file).

wdyt is a good move here? I can modify pins-python to ignore pins.txt files, but I'm wondering what the right way sequence is for releasing..

edit: how I tested

Using an .env file with credentials filled in from .env.dev.

In python:

from dotenv import load_dotenv
load_dotenv()
from pins import board_s3
from pins.tests.helpers import BoardBuilder

bb = BoardBuilder("s3")
board = bb.create_tmp_board("pins/tests/pins-compat")

# list path to temporary pins bucket, for copying into R
board.board

Switching to R:

board <- board_s3(<bucket_from_above>, prefix="<prefix_from_above>/")

# note that this also prints a warning
board %>% board_manifest()

Back in python:

board.pin_search("abc")
juliasilge commented 1 year ago

@machow In R, what worked best was making sure that pin_list() only ever returns directories. This is helpful in general in case people are putting other files in, say, an S3 bucket that holds pins. (Of course, doesn't help if there are also directories there, but at least a little better.)

What do you think about making the needed change in Python (whether that is special-casing _pins.yaml or only listing dirs) and releasing to PyPI before the next CRAN release for R? I think we could still merge here because the likelihood of someone using the development version of pins for R with pins for Python and doing pin_list() in Python seems low. Merging here let's us move forward with other related next steps.

Also, I just want to make sure you are working with the current head of this branch. You wrote board %>% board_manifest() but there hopefully is no function by that name exported with this branch currently (it's write_board_manifest().

machow commented 1 year ago

This is helpful in general in case people are putting other files in, say, an S3 bucket that holds pins.

I wonder if we should discourage people from putting other files/folders into a pin board? Ignoring files will work okay, but if people put folders in they'll get listed as pins (afaict the exact behavior varies from board to board).

For example, if a folder with a file in it (abc/xyz.txt) is inside of an S3 board, then pin_search will raise an error (because it will try to pull the metadata for a pin named abc):

image
machow commented 1 year ago

What do you think about making the needed change in Python (whether that is special-casing _pins.yaml or only listing dirs) and releasing to PyPI before the next CRAN release for R?

I agree merging here seems okay -- since the odds someone uses this and is also using python boards seems low! Maybe we could include a warning when they run write_board_manifest (or the relevant command)?

In any event, I'm on board with merging as is, and then removing _pins.yaml from python board results!

github-actions[bot] commented 1 year ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.