Closed jeffkeller87 closed 2 months ago
Thanks for this question @jeffkeller87! I think the short answer is "no" because we haven't built either R or Python pins with an eye toward being used by directly from a shell or similar, but since it's all just files and directories, certainly you have options:
ls | tail -1
./latest
directory is great too! Maybe it is the most straightforward for your situation.I don't think it's likely that pins starts keeping a /latest
directory since that's not the main use case we're targeting, but certainly you can use the directory structure (maybe together with a manifest file) to manage this from a shell in a couple of ways.
Thank you @juliasilge for the very thoughtful response. I agree that replicating the /latest
copy / link within pins probably isn't the right thing to do. However, if there is room to improve the interop surface of pins, I think that would be worth pursuing.
To that end, is the naming schema sufficient for determining the latest version? I figured the truncated hash would cause issues if more than one pin version was written within the same second. That's probably good enough for what I'm doing, but I can see it causing issues in other scenarios. Do you have a strong preference for the hash over sub-second markers?
The manifest file was the other option I considered. My optimism deflated a bit when I saw it was a YAML file rather than JSON--only because of how long it took me to convince my Infrastructure / IT people to install jq
in our runner image. Theoretically, I could get them to install yq
too :)
Oh yes, you are definitely correct that the timestamp doesn't distinguish between versions written within the same second. This has come up before and to date, the only time this has been a problem is in kind of "fake" situations, like when building a vignette or when people are writing tests in other packages that use pins. We haven't heard of problems with the timestamp in people's real work, since most folks are pinning, say, a model binary or a summarized dataset coming out of an ETL pipeline. Folks are generally not using pins for super high performance writing, at least so far.
In your use case, would subsecond information be practically important?
## what we do now:
format(Sys.time(), "%Y%m%dT%H%M%SZ", tz = "UTC")
#> [1] "20230926T161828Z"
## we could do something like:
format(Sys.time(), "%Y%m%dT%H%M%OS2Z", tz = "UTC")
#> [1] "20230926T161828.26Z"
Created on 2023-09-26 with reprex v2.0.2
In my cases, there should be no chance of a sub-second temporal collision like that. But there's always those unexpected scenarios where another writer sneaks in at just the wrong time, and then pulling hair figuring out what happened when the pin you just wrote isn't the one that gets read immediately after (using the ls | tail
method).
Modifying the timestamp format would shrink the probability further, but it makes specifying an explicit version more onerous in pin_read()
.
I think the current behavior is fine as-is. If someone is writing this frequently intentionally, they probably don't want a versioned board anyway.
That makes a lot of sense. I'm going to leave this issue open for discussion in case other folks come by with this same need in the near future; we can reevaluate as we hear more on it. Thanks again for the question @jeffkeller87!
It sounds like we haven't seen a high need for improvements in this area so I am going to close this issue. We can revisit in the future if we hear more from users on this! 🙌
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
A follow-up to this post and similar to https://github.com/rstudio/pins-r/issues/590.
I love how I can use {pins} instead of maintaining my own artifact management process. It really cuts down on the amount of boilerplate code in my projects!
However, I often have a need to read pins from a system where installing either the R or Python {pins} package is not possible. In my case, these systems are ephemeral continuous integration runners with a limited set of software installed. Specifically, I am grabbing the latest model artifacts from S3 to COPY into a Docker image.
My current solution is to write artifacts to a
latest/
prefix (or directory, if S3 is not the storage media) in addition to a timestamped prefix. If the storage media is a filesystem, I sym- or hard-linklatest/
to the appropriate timestamped directory. The structure looks something like this:From a system without {pins}, I can then reference a static path to get the latest artifacts.
Without {pins}, is there a straightforward way to identify the latest pin version in a board?