Closed hadley closed 2 years ago
I am curious if adding /
to the name will in effect create a subdirectory?
This may be what I am trying to solve with board_rsconnect
, currently it doesn't look like I can create subdirectory under board_rsconnect
.
Perhaps related to this: https://github.com/rstudio/pins/issues/348#issuecomment-822651739, you can see how the data sets are organized under each folder/sub folder. Can we have the same functionality on RStudio Connect?
@EKtheSage it will probably create a subdirectory in the on-disk cache, and will effectively act like a subdirectory remotely, but won't actually be a subdirectory. This is a probably confusing answer, so a better question to ask would be: what properties of a subdirectory do you want a pin to possess?
I produce multiple version of the same data set every month. Each data set has the same number of columns but has different values in the columns, due to the different parameters being applied to it.
So, my use case is I want to correctly retrieve the data set based on the month and the parameters I specify. I think storing the date and parameter in the metadata
is the better way to go, but I am unsure how to use pin_read
to get the data set I want as pin_read
always returns the most recent version.
pin_write(
board = board,
x = iris,
name = 'Iris_Data',
desc = 'Iris Take Two',
metadata = list(
Calendar_Date = '2021-06-30',
parameter = 'alpha'
)
)
pin_write(
board = board,
x = iris,
name = 'Iris_Data',
desc = 'Iris Take Two',
metadata = list(
Calendar_Date = '2021-06-30',
parameter = 'beta'
)
)
# unsure how to retrieve alpha version
pin_write(
board = board,
x = iris,
name = 'Iris_Data_2021_06_30_alpha',
desc = 'Iris Take Two',
)
pin_write(
board = board,
x = iris,
name = 'Iris_Data_2021_06_30_beta',
desc = 'Iris Take Two',
)
pin_read(board, name = 'Iris_Data_2021_06_30_alpha')
That use case sounds out of scope for pins right now. I'd suggest sticking with putting the params in the pin name.
Thank you. Then what is the purpose of metadata
of a pin? Is it only useful when we call pin_meta
to learn more about a particular pin?
Yes, it gives you additional information about the pin. Making it more database-y so that you could easily search it would be a huge amount of work.
It's now not entirely to me what this would imply, so I'm going to leave for now. Will eventually need to consider broadly, thinking about:
board_rsconnect()
pin_download()
(particularly for boards, like board_kaggle()
, where we're reading from non-pin data sources).We use / in basically all of our pins to AWS S3, so not having this in 1.0.0 is a breaking change for us (though one we can work around with some effort).
We like using / both to organize pins conceptually by project and subproject, and because we like that S3's browser interface lets us explore them as nested directories (even though they technically aren't)
@dgrtwo FWIW it shouldn't affect the legacy board, only board_s3()
, so your code should continue to work. But now I'm wondering why I enforce this constraint globally — maybe I can just turn it off for board_s3()
and board_azure()
? I think there's some potential for weird stuff to happen if you did (say) board %>% pin_write("a")
and board %>% pin_write("a/b")
, but you're probably not doing that, and the weirdness probably isn't that weird/risky.
I'd definitely be happy if it's preserved in AWS boards!
To document the strangeness (which I agree is unlikely to appear in the wild) what I see in legacy pins is:
pins::pin(mtcars, "a", board = "heap")
pins::pin(diamonds, "a/", board = "heap")
pins::pin(mpg, "a/b", board = "heap")
# Recovers all three well, in the right order:
pins::pin_get("a", board = "heap")
pins::pin_get("a/", board = "heap")
pins::pin_get("a/b", board = "heap")
So writing and reading work. But when I go in the AWS, I can't find the "a" (mtcars) pin. (I don't know if it's different in the new pins approach; but either way I don't think it's a problem).
@dgrtwo I remembered one important difference with the modern S3 board — it doesn't maintain a central listing of all pins (because it's basically impossible to do so from pins without introducing the possibility of race conditions). This means that I'm currently relying on directory listings to get a list of all pins. So adding a directory hierarchy would either make pin_list()
substantially slower (because I now have to spider directories) or it'll be inaccurate (which might be ok if you don't rely on pin_list()
/pin_search()
and instead just pass around pin names).
Oh: that makes sense. And while I wouldn't mind losing that behavior in pin_list
I suppose it would be really confusing for someone who creates a pin then isn't able to see it in list.
At first I thought you could use the --recursive
flag to S3's list, but looks like that's only in the command line interface, not the API (weird) https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html
So I understand not supporting nested directories. I can work around it if we switch to the new pins implementation. (I do think this is going to be a common request as boards grow in complexity, though)
Yeah, I’ll keep thinking about it but not sure I’ll be able to figure it out in time for the 1.0.0 release.
@dgrtwo what if the subdirectory was a board parameter (as in #495)?
board_marketing <- board_s3(prefix = "marketing")
board_sales <- board_s3(prefix = "sales")
board_marketing_expt <- board_s3(prefix = "marketing/experimental")
I think I've decided it's better to do this namespacing at the board level, rather than at the pin level.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
For lightweight namespacing. Will need to carefully check that don't contain
..
etc.