rstudio / pins-r

Pin, discover, and share resources
https://pins.rstudio.com
Other
312 stars 63 forks source link

`created_time` for Connect needs to use a different ISO8601 variant #623

Closed bjfletcher closed 2 years ago

bjfletcher commented 2 years ago

pins seems to expect created_time with bundles to be in POSIXct format however we're seeing them in POSIXlt format.

RStudio Connect (on Kubernetes): Version: 2022.05.0 Build: v2022.05.0-0-g762d29c

This PR fixes it for us, however, we we will most probably want to implement something like this pseudocode:

  1. check created_time
  2. is it POSIXct format?
  3. if yes, parse as POSIXct
  4. if no, parse as POSIXlt

along with tests. But we'd like to check with you first about the approach before we implement the changes and push to this PR.

machow commented 2 years ago

Hey, thanks for opening this PR! Do you mind explaining a bit the issue encountered? What were you trying to do when you noticed this, and how did it cause a problem?

bjfletcher commented 2 years ago

Hello @machow! Thank you for your comment. I'm sorry I didn't provide any context.

So a simple test case:

board <- pins::board_rsconnect(auth = "envvar")
name <- 'pin-test'
board |>
  pins::pin_write(list(1, 2, 3), name = name)
board |>
  pins::pin_versions(name)

You will see that the created column is always NA.

So with this PR, the created column shows the actual dates.

I've checked with RStudio Connect, who confirmed it's a bug with pins:

The PINS package uses the RStudio Conect API api/v1/content//bundles.

Here's an example query against a Connect server with the following command:

curl --silent --show-error -L --max-redirs 0 --fail \ -H "Authorization: Key ${CONNECT_API_KEY}" \ "${CONNECT_SERVER}api/v1/content/914848b5-06a1-4d8b-9446-bfaa5f94b636/bundles" | jq

which returned the JSON of:

[ { "id": "1", "content_guid": "914848b5-06a1-4d8b-9446-bfaa5f94b636", "created_time": "2022-06-24T18:46:03Z", "cluster_name": null, "image_name": null, "r_version": null, "py_version": null, "quarto_version": null, "active": false, "size": 13333, "metadata": { "source": null, "source_repo": null, "source_branch": null, "source_commit": null, "archive_md5": "e608b17727a524543410405a43231128", "archive_sha1": "aa9c859cf6c5fdc6e1884937da26261eae3b909e" } } ]

As you can see, the created_time attribute returned is a formatted RFC3339 (https://datatracker.ietf.org/doc/html/rfc3339) character string. This API is defined within: https://docs.rstudio.com/connect/api/#post-/v1/content.

We did validate it to be returning the correct UTC time in this experiment

As such, it would be appropriate to use the POSIXlt function when parsing the value returned from the API, which is not what the current PINs package uses (as per the PR).

This does seem to be a bug within the PINs package and there are no workarounds from within Connect available.

However, I do not see the need to check for a different format, as the v1 API will never return the created_timeas the number of seconds since epoch.

I understand that this means the created column has always been NA because RStudio Connect has never returned the compact version.

So if that is correct, we'll want to merge this PR as is. :)

juliasilge commented 2 years ago

I have experienced this myself @bjfletcher so thank you so much for the contribution! We are in the midst of revamping how authentication is handled in pins for CI and testing but we will merge this in as soon as we get that hammered out. 👍

machow commented 2 years ago

@bjfletcher thanks for the helpful context on the bug! I'm noticing that the pin_versions code for RSConnect is using different metadata, than e.g. pin_meta and pin_search use (pin_versions gets it from RSConnect API bundle metadata, while these other functions use each version's data.txt files).

It could be that pin_versions did this as an optimization (probably faster to retrieve version data this way). But it likely produces slightly different creation times than you'd get by running pin_versions against any other backend.

It seems okay to me to get the data in the way pin_versions is doing it, and apply this fix, but @juliasilge wanted to flag for us to double check beforehand!

edit: wait -- other boards in R pins (and pins-python) use something like ls to list all the version names, then parse the created at info from that. (e.g. this code in board_folder). This is impossible in RSConnect, so it makes sense that hitting the API for a bundle's created datetime is the best we can do (since pulling each data.txt would be slow).

juliasilge commented 2 years ago

Thanks for this PR @bjfletcher!

github-actions[bot] commented 2 years ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.