Closed juliasilge closed 1 year ago
Tested and works as expected. If I'm a user, I can work with approach well, though I think the documentation should be explicit in noting that the hash is not affected by changed metadata. I like that it's an opt-in argument, which means I will be reading the docs to understand what I'm getting into. There might be situations where the user would want the status (pinned or didn't pin) to be returned in addition to the pin name. Every use case I envision though, I can noodle through how I'd determine the pinning status by other means. I think it would be wise to keep our ears open as to whether this is a need from the community that we'd implement in the future.
Thank you so much @kmasiello! 🙌
There might be situations where the user would want the status (pinned or didn't pin) to be returned in addition to the pin name.
This is a good point. Right now, pin_write()
only returns the name
(like "julia.silge/nice-numbers"
) but we have been discussing in #592 how to better organize what should be returned. I'll add this info to that issue and we can see what would work best.
Do we think that now in this PR pin_write()
should return something different than the name if it does not update the pin? Or should we wait for broader changes to what is returned?
Refined the warning message a smidge:
library(pins)
b <- board_connect()
#> Connecting to Posit Connect 2023.03.0 at <https://colorado.posit.co/rsc>
b %>% pin_write(1:10, "julia.silge/really-nice-numbers", check_hash = TRUE)
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/really-nice-numbers'
b %>% pin_write(1:10, "julia.silge/really-nice-numbers", check_hash = TRUE)
#> Guessing `type = 'rds'`
#> Warning:
#> ! The hash of pin "julia.silge/really-nice-numbers" has not changed.
#> • Your pin will not be stored.
b %>% pin_write(1:11, "julia.silge/really-nice-numbers", check_hash = TRUE)
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/really-nice-numbers'
Created on 2023-05-01 with reprex v2.0.2
Do we think that now in this PR
pin_write()
should return something different than the name if it does not update the pin? Or should we wait for broader changes to what is returned?
My gut says this should be handled with the broader considerations of what we return. I don't want to see a bespoke solution here that isn't compatible with future work.
Additional suggestion and comments after reviewing with users -
check_hash
may be better named as check_data_hash
to emphasize that metadata is not checked.
Checking metadata for changes is an outstanding ask. A sample scenario is using check_hash = TRUE
but wanting to update the metadata to reflect a "current as of" date. You would want to be able to indicate the pin is still fresh, even if the data hasn't changed.
A sample scenario is using
check_hash = TRUE
but wanting to update the metadata to reflect a "current as of" date.
Can I make sure I am understanding this? It sounds like users are expressing a desire to:
This would require a more extensive change than in this PR, will be pretty tough in the near term, and likely would require special handling of different backends (S3 vs. Connect vs. folders vs. ...). As of now, updating the metadata but not the pin contents isn't something straightforward to implement. We could consider how to approach this problem in the future.
I tend to think we should stick with check_hash
as the argument name since we will likely think through checking portions of the metadata as well (title, description, tags, and user metadata could all currently change without changing the pin hash).
I tend to think we should still go with the change as scoped here because it lets us offer a solution to the storing-many-copies problem. It does mean folks have to choose between having the metadata reflect the last time the pin contents really changed and the last time a write was attempted; this still seems better than the situation as it is now.
Agreed. As currently scoped, this helps the main use case tremendously. If I'm very keen to only store the changed data frame but have a record of the last write attempt, I can bake this logging info into my upstream ETL process. Similar to the other comment about returning the "wrote / didn't write" status, it will be good to keep ears open to use cases around metadata changes.
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Addresses #589
This does not solve all our hashing issues (for example, the
pin_hash
still only looks at the pin contents and not the metadata) but it is a small step toward a better hashing situation.After talking with users, I think that an opt-in new argument is the best way forward (rather than, say, a new function or new default behavior). This new
check_hash
arg comes after the dots and defaults toFALSE
. It does require another read ofpin_meta()
to get the old hash.Created on 2023-04-26 with reprex v2.0.2
With this PR as is now,
pin_write()
will still return the name invisibly but does not store the pin and generates a warning. Thoughts?