rstudio / pins-r

Pin, discover, and share resources
https://pins.rstudio.com
Other
312 stars 63 forks source link

Provide a helper function to download pins straight from Connect content/vanity URLs #616

Closed sellorm closed 1 year ago

sellorm commented 2 years ago

Right now, three pieces of information (perhaps four, depending on how you look at it) are required for reading a pin from RStudio Connect:

  1. The Connect server URL
  2. the full name of the pin ("username/pin_name" - You might count these separately. 🤷‍♀️)
  3. Your API key

This increases friction for Connect users since they're more accustomed to sharing either content URLs or vanity URLs. More pieces of information to share means more chance for errors.

If your objective is to read a pin from Connect with the least amount of friction, the above is too much information.

If we provided a function that could download directly from a Connect content URL (either a standard content URL, which are very long or a vanity URL), we would reduce the number of pieces of required information to two.

  1. The content/vanity URL
  2. Your API key

Something like:

library(pins)
read_pin_from_connect("https://connect.example.com/my_penguin_pin", api_key = "xxxxxxxxxxxxx")

This would be a convenience function to load the latest version of the pin from Connect only. It would not offer the full power of working with a board

iainmwallace commented 2 years ago

This would be incredibly helpful

slodge commented 1 year ago

I'd also love this.

At the moment, I can't see any way to access API protected pins by their vanity urls which is a shame as vanity is a nice feature in Connect.

juliasilge commented 1 year ago

Have you tried using board_url() for this, as outlined in the docs here? It would work like this with your vanity URL:

library(pins)
board <- board_url(c(
  my_vanity_url_pin = "https://colorado.posit.co/rsc/great-numbers/"
))
board %>% pin_read("my_vanity_url_pin")
#>  [1]  1  2  3  4  5  6  7  8  9 10

Created on 2022-12-15 with reprex v2.0.2

I believe if you have access to that content on Connect, you should be able to run this; it wouldn't necessarily need to be set to "Anyone - no login required". Nope, this was not correct; this only works for pins set to "Anyone - no login required".

slodge commented 1 year ago

@juliasilge Thanks. Just tried that with an example pin on our server.

For "no login required" the data downloaded.

For "all users - login required" I got:

Error in http_download(url = paste0(url, "data.txt"), path_dir = cache_dir,  : 
  Not Found (HTTP 404).

I tried also setting CONNECT_API_KEY and CONNECT_SERVER but that didn't seem to help. There would have to be some auth mechanism here - so maybe that's why the docs have only listed the vanity urls in the "Public" section?

Running Connect version 2022.11.0, calling from R4.2

juliasilge commented 1 year ago

Ah yep @slodge you're right; thanks for checking.

slodge commented 1 year ago

I'm on the outside of our firewall for a week now (aka it's Xmas!), but I'll try to take a look at this - since rsc_get simply adds a header Auth token to the httr request, it does feels like it should be possible to make this work with just a very small PR 👍

juliasilge commented 1 year ago

I don't think it makes sense to use board_connect() for this because it's really built around looking up content by name. There isn't a mapping in the API from vanity URL to content GUID or name, just the other way around.

I am not sure that we want to change board_url() to use the functions like rsc_auth() and rsc_download() because those are all about the board (the authentication details are part of the board). What may end up making the most sense is a new read-only board type such as board_connect_url() that can reuse rsc_auth() and rsc_download() but doesn't need to look up content via the API.

slodge commented 1 year ago

From the user of the code perspective, I'd love the rsc API to include the vanity metadata to enable the mapping...

For this one, how about adding some form of auth or headers parameter to board_connect_url? That'd enable future work for other board platforms too?

juliasilge commented 1 year ago

Yes, I think that a new board_connect_url() would have authentication stored in the board object, very similarly to board_connect(). What would be different is that it would not look up content via the API.

slodge commented 1 year ago

Back from holiday... have written something in #696

Current working code is:

devtools::load_all()

auth <- board_url_auth_connect(key=Sys.getenv("UAT_KEY_1"))

board <- board_url(c(
    my_vanity_url_pin = "https://connect-uat.example.com/example_mtcars_pin/"
  ), 
  auth=auth,
  use_cache_on_failure = FALSE
)

data <- board %>% pin_read("my_vanity_url_pin")
data
slodge commented 1 year ago

As I wrote those I am wondering about whether to go closer to @juliasilge's suggestion of a new board type - as it would be nice to not have to provide a named vector of urls - e.g. to somehow provide an api closer to:

devtools::load_all()

auth <- board_url_auth_connect(key=Sys.getenv("UAT_KEY_1"))

board <- board_vanity(
  "https://connect-uat.example.com/example_mtcars_pin", 
  auth=auth,
  use_cache_on_failure = FALSE
)

data <- board %>% pin_read()
data

or alternatively something like:

devtools::load_all()

auth <- board_url_auth_connect(key=Sys.getenv("UAT_KEY_1"))

board <- board_vanity(
  auth=auth,
  use_cache_on_failure = FALSE
)

data <- board %>% pin_read("https://connect-uat.example.com/example_mtcars_pin")
data

I appreciate that neither of those really match the shape of the current board and pin_read API surfaces... but they feel "nice" from a user perspective...

Just adding them here as suggestions/ideas - not sure they'll ever go anywhere!


I guess we could add a helper function just for single vanities:

pin_read_vanity <- function(url, auth_type = c("manual", "envvar"), key = NULL, use_cache_on_failure = is_interactive(), ... ) {

   auth <- board_url_auth_connect(auth_type, key)
   board <- board_url(c(
       my_vanity_url_pin = url
     ), 
     auth=auth,
     use_cache_on_failure = use_cache_on_failure 
   )

   data <- board %>% pin_read("my_vanity_url_pin", ...)
   data
}

.... but I can see it being contentious that this pin_read_vanity doesn't take board as a parameter....

juliasilge commented 1 year ago

Thank you so much for your work on this so far @slodge! 🙌

One good option is to keep this looking as much as possible like the existing board_connect() and board_url(), like so:

library(pins)
board <- board_connect_url(c(
  vanity01 = "https://colorado.posit.co/rsc/great-numbers/",
  vanity02 = "https://colorado.posit.co/rsc/other-great-numbers/"
))
board %>% pin_read("vanity01")

What are your thoughts on that? Is the ability to keep multiple vanity URLs in the same board object important?

We could also make this work for a single URL (probably not too hard to support both):

library(pins)
board <- board_connect_url("https://colorado.posit.co/rsc/great-numbers/")
board %>% pin_read()

The function signature for this new board would be almost exactly the same as board_connect():

board_connect_url <- function(urls,
                              auth = c("auto", "manual", "envvar", "rsconnect"),
                              server = NULL,
                              account = NULL,
                              key = NULL,
                              cache = NULL,
                              name = "posit-connect-url",
                              versioned = TRUE,
                              use_cache_on_failure = is_interactive()) {
 ...
}
slodge commented 1 year ago

@juliasilge Thanks 👍

I'm not keen on all the copy paste involved in creating another board_* variant - that's why PR #696 added the auth into the existing board_url. It's not the initial copy paste that makes my spidersense tingle - it's the later maintenance.

I also kept the auth I added as non-connect-specific as I could - in theory, I could imagine an S3 or Azure Blob pin which required some auth (e.g. a temporary access key sent in http headers) and I was hopeful they could all go via the existing board_url too.

However, if the project owners/posit would prefer to create more board_ variants, and would prefer one per added auth then happy for you to make that call :+1:

Is the ability to keep multiple vanity URLs in the same board object important?

I started writing:

To me, no. When you have to specify each URL anyway, then one board per URL is the same effort for the coder?

But then I though about it... so:

To me, yes. It's important that the board_ polymorphism works and that someone writing functions/libraries can rely on board_url working in a similar way to board_connect, board_azure, etc

(Aside: I am optimistic that at some point each Connect instance gets some ability to host boards so we can add some logical grouping to pins - but I think it might be too early to guess how vanity urls would work in multi-board connect instances!)

We could also make this work for a single URL

Changing the signature of pin_read() so that it can work with a NULL name feels like a change that would only work for a single url board - I like the look of the code, but wasn't sure if that change was OK - is it?


Sorry - more questions than answers :)

slodge commented 1 year ago

@juliasilge I'd really love to get https://github.com/rstudio/pins-r/pull/696 or something done. How do decisions get made on what the design should be?

The more we use pins with Connect, the more I want vanity URLs to work for both read and write - e.g. we really don't want to see individual user names in a pin name (people come and go, data lives on forever!)

juliasilge commented 1 year ago

@slodge Right now I'm working on getting a release done and then this will be next up! We are going to go with this option, if you would like to change your PR or open a new one:

library(pins)
board <- board_connnect_url(c(
  my_vanity_url_pin = "https://colorado.posit.co/rsc/wacky-weird-numbers/"
))

board %>% pin_read("my_vanity_url_pin")
slodge commented 1 year ago

I'm dithering...

My coding soul hates copy pasta... and I'm wondering if there's some way we could see this integrated into the existing board_connect rather than creating a new copied and pasted readonly board...

There's an interesting comment https://github.com/rstudio/connectapi/blob/ec4883e5012c3133ea404d4b08dc4da41cf9e0df/R/audits.R#L43 about # TODO: why does vanities not work?

... but I'm wondering if we should/could be using:

I hope @colearendt (and maybe @kelli-rstudio) don't mind me mentioning them here - but suspect they might have useful and more knowledgeable input than me here :)


This might be a new issue... but while we're in there, this library really should remove the 1000 hard-coded limit in this library - I think we wrote some purrr::walk code this week which will see us go higher than that at some point this year - https://github.com/slodge/pins-r/blob/b50eb57ae5879e28f85e42940f8f4944b49ff471/R/board_connect.R#L166


Good luck with the release - this can definitely wait 👍

juliasilge commented 1 year ago

I've got a new board_connect_url() up and running in #706!

If any of you are interested in installing it to try it out in your particular situation, that would be super helpful. You can install via devtools::install_github("rstudio/pins-r@board-connect-url"). As a heads up, this is a read-only board that does not support versioning.

slodge commented 1 year ago

Still feel this is adding code that isn't really wanted...

Why not just get the vanity urls exposed in the Connect APIs - then pins doesn't need 2 separate Connect implementations, and vanity urls would support list, write and versioning as well as read?

... I'm also on holiday at the moment so can't test for at least a week - sorry :)

juliasilge commented 1 year ago

@slodge We'll definitely appreciate any feedback on the PR when you get a chance! Hope your vacation is going well.

Unfortunately changing how vanity URLs are handled/supported on the Connect side isn't doable currently, so we've gone for a solution that can be implemented now. We're focusing on the best way to provide a solution given the constraints of the Connect API.

hadley commented 1 year ago

@slodge in the fullness of time, we hope that vanity urls will "just work" with board_connect(), but it's going to take some time to get the API work on the connect roadmap, implement it, then get it into a release. So the goal of board_connect_url() is to create that's useful now, even as we work towards improvements in the connect API in the long term.

slodge commented 1 year ago

@hadley @juliasilge Thanks... and....


fullness of time, we hope.... some time to get the API work on the connect roadmap, implement it, then get it into a release

Firstly, sorry - my flippant mid-holiday post was clearly too short yesterday... sorry.

I definitely was not trying to propose a big project here. I was hoping this could be done via a v small bug-fix - i.e. making GET /v1/content return a content_url computed from it's vanity url - which is what the docs says it does anyway.

image https://docs.posit.co/connect/api/#get-/v1/content

I guess this might be a bigger task than it looks from the outside; I know how very small bugfixes can quickly expand to fill my entire month; and I appreciate you might want to bundle this change in with other changes...

...so no worries if you want to put something bigger on the roadmap instead 👍


Secondly, thinking about it, I think there is also an opportunity to use vanity_url inside the existing board_connect without any work on the server side.

The pin_list.pins_board_connect function uses the API GET /__api__/applications - https://github.com/rstudio/pins-r/blob/main/R/board_connect.R#L351

This is not listed on https://docs.posit.co/connect/api/ (I assume it's the internal API for the dashboard) - but luckily it does already return the a vanity url flag and the vanity urls themselves.

image Company name removed as this is definitely just me asking - I'm on holiday :)

It's a choice made in the pins code to key by username/content-name and to not use the vanity urls provided - in https://github.com/rstudio/pins-r/blob/main/R/board_connect.R#L351

I appreciate using this might require a small amount of rework in the board_connect code to change e.g. rsc_find to use GET /__api__/v1/vanities to find GUID by vanity URL, but it all feels doable...

If you're open to the idea I'd be happy to dive in and assist with some of that in a PR at some point "in the next month"... but if you just want to add another board variant and push "doing it properly" down the road, then I'll also understand 👍


Anyways... sorry for creating noise with my short response yesterday. I'll close https://github.com/rstudio/pins-r/pull/696 since it also wasn't the approach wanted; and if you want to push ahead with a new board, then I promise to stop pestering you about alternatives. I appreciate there's always more than one way to do things 👍

hadley commented 1 year ago

@slodge I see that the v1/content end point returns the vanity url, but we can only search for the content by the name or the guid. The vanity url ends give us a way to find the guid given a vanity url, but we have the opposite problem: we will have the vanity url and want the guid.

I think pin_list() is a bit of a red-herring; yes we probably could download and cache the meta data for all pins on a connect instance, but I think that's going to be quite slow.

What am I missing?

slodge commented 1 year ago

I doubt you're missing much! And you've obviously got much better vision than I have on how big a job this is inside Connect (and how many other things it's competing with)

I think technically we're on a par about how different approaches could work 👍

My instincts are saying:

However... this is Posit's server code and Posit's API - I'm just one user - so I'm stopping talking now - this is a tiny issue... so probably already talked way too much 👍

juliasilge commented 1 year ago

We appreciate your thoughtful feedback @slodge! We have had enough requests for this feature (even given the constraints of being read only without support for versioning) that we think it is worth implementing with the Connect API as it is now.

Unfortunately, the /vanities endpoint requires admin access which we don't expect pins users to have. I don't believe there is any way to get the GUID from the vanity URL for a regular user. This is the basic constraint that results in the board needing to be read only without support for versioning.

slodge commented 1 year ago

Just to check for myself I wasn't going completely mad - and because I've sadly come back from holiday today - I had a go at writing some code. Pushed it to https://github.com/rstudio/pins-r/pull/716 - but am not putting it forward as a suggestion - I was just checking I wasn't misunderstanding anything. I still totally respect your team own the server code, and maintain this package too - so please do push ahead with board_connect_url() as planned :+1:

... I'd now better get back to the mountain of unread out-of-office emails - thanks for the code distraction!

kmasiello commented 1 year ago

Really enjoying the thoughtful comments and contributions, and I appreciate how thoughtful everyone is being about trying to implement a good solution to what is turning out to be a sticky issue!

I think I can add a little color based conversations I've had with other users. I hear two separate pain points captured in this issue:

Pain point 1 For organizationally-important pins, a pin name of "/" is undesirable.

Pinning by service account (e.g., "finance-team/data") conveys better pin provenance and governance. This is an ideal state, but not currently possible. In the interim, can working with pins based on vanity URL suffice? Not completely under this iteration because we're lacking the pin versioning and writing capabilities.

Pain Point 2a and 2b (I lumped these because the persona / use case overlaps heavily) We're already accustomed to sharing a Connect vanity URL for the "production" version of an application so end consumers have one clean, static URL to access content. The persona and use case here also wants the "production" version of a pinned dataset. This user only needs read access to the data, and there is not a need to access previous versions because we only want the version deemed production by the vanity URL. We can assign a vanity URL to a pin today, however...

Pain point 2a is that we can't work with the vanity URL for a pin when reading data. Additionally, we lose out on the ability to hot swap the production version of a pin by vanity URLs like one can with other content.

Pain point 2b is that reading pins from Connect is a two-step process that requires the pin consumer to input via a board_* function: (a) the server address, (b) auth details; and then input via pin_read (c) the pin name. This isn't a painful process for someone who is in the muck day to day. But the the publisher and end data consumer of a vanity URL-ized pin just want quick and low-barrier access to that pin.

If I have a vanity URL, (a) and (c) are already specified by the URL. It is redundant to use a board_url* function and a pin_read* function in this workflow. The ideal solution would be to combine the board definition function and pin read function into a single step, e.g., pin_read_from_connect(<url>, <API key>). This is an anti-pattern to how pins currently works, and sounds like a heavy lift to smoosh these together.

This is a tough one all around. Around Pain point 1, the read-only limitations and failure to solve the underlying problem of pinning by service account make this tough. If the two-step workflow in Pain point 2b could be addressed, it would feel like a slam dunk for that pain point, but there be dragons in implementing that.

slodge commented 1 year ago

Thanks 👍 Writing the code is the easy bit.

I just tried and it's easy to guess a Connect board - e.g. https://github.com/rstudio/pins-r/pull/717. That is only a few additional lines on top of #716. However, not sure it feels in the spirit of the pins API - it feels like it'd be very Connect specific... It might feel better somehow if it tried to create a board_url and if that board_url detected it was a connect somehow (e.g. using the hostname)? But even this feels like it's trying too hard to make the existing API surface fit the pain points/use cases....

juliasilge commented 1 year ago

Thanks to all for your patience on this issue! 🙌 We landed on an option in #732 that we feel good about both adding now and maintaining/supporting in the long run:

library(pins)
board <- board_connect_url(c(
  numbers = "https://colorado.posit.co/rsc/some-nice-numbers/"    ## this is a private pin
))
board %>% pin_read("numbers")
#>  [1]  1  2  3  4  5  6  7  8  9 10

Created on 2023-04-14 with reprex v2.0.2

This only works with environment variable authentication, and you can see new documentation here. You would install with remotes::install_github("rstudio/pins-r") to use this right now.

We will continue to plan for Connect-specific features like service accounts in the future.

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.