Allow `/` in pin names - Githubissues

hadley commented 3 years ago

For lightweight namespacing. Will need to carefully check that don't contain .. etc.

EKtheSage commented 3 years ago

I am curious if adding / to the name will in effect create a subdirectory?

This may be what I am trying to solve with board_rsconnect, currently it doesn't look like I can create subdirectory under board_rsconnect.

Perhaps related to this: https://github.com/rstudio/pins/issues/348#issuecomment-822651739, you can see how the data sets are organized under each folder/sub folder. Can we have the same functionality on RStudio Connect?

hadley commented 3 years ago

@EKtheSage it will probably create a subdirectory in the on-disk cache, and will effectively act like a subdirectory remotely, but won't actually be a subdirectory. This is a probably confusing answer, so a better question to ask would be: what properties of a subdirectory do you want a pin to possess?

EKtheSage commented 3 years ago

I produce multiple version of the same data set every month. Each data set has the same number of columns but has different values in the columns, due to the different parameters being applied to it.

So, my use case is I want to correctly retrieve the data set based on the month and the parameters I specify. I think storing the date and parameter in the metadata is the better way to go, but I am unsure how to use pin_read to get the data set I want as pin_read always returns the most recent version.

pin_write(
  board = board,
  x = iris,
  name = 'Iris_Data',
  desc = 'Iris Take Two',
  metadata = list(
    Calendar_Date = '2021-06-30',
    parameter = 'alpha'
  )
)

pin_write(
  board = board,
  x = iris,
  name = 'Iris_Data',
  desc = 'Iris Take Two',
  metadata = list(
    Calendar_Date = '2021-06-30',
    parameter = 'beta'
  )
)

# unsure how to retrieve alpha version

pin_write(
  board = board,
  x = iris,
  name = 'Iris_Data_2021_06_30_alpha',
  desc = 'Iris Take Two',
)

pin_write(
  board = board,
  x = iris,
  name = 'Iris_Data_2021_06_30_beta',
  desc = 'Iris Take Two',
)

pin_read(board, name = 'Iris_Data_2021_06_30_alpha')

hadley commented 3 years ago

That use case sounds out of scope for pins right now. I'd suggest sticking with putting the params in the pin name.

EKtheSage commented 3 years ago

Thank you. Then what is the purpose of metadata of a pin? Is it only useful when we call pin_meta to learn more about a particular pin?

hadley commented 3 years ago

Yes, it gives you additional information about the pin. Making it more database-y so that you could easily search it would be a huge amount of work.

hadley commented 3 years ago

It's now not entirely to me what this would imply, so I'm going to leave for now. Will eventually need to consider broadly, thinking about:

Listing all pins in a "directory"
How directories interact with the hierarchy implied by user names in board_rsconnect()
Whether it would be useful to allow access to individual files from pin_download() (particularly for boards, like board_kaggle(), where we're reading from non-pin data sources).

dgrtwo commented 3 years ago

We use / in basically all of our pins to AWS S3, so not having this in 1.0.0 is a breaking change for us (though one we can work around with some effort).

We like using / both to organize pins conceptually by project and subproject, and because we like that S3's browser interface lets us explore them as nested directories (even though they technically aren't)

hadley commented 3 years ago

@dgrtwo FWIW it shouldn't affect the legacy board, only board_s3(), so your code should continue to work. But now I'm wondering why I enforce this constraint globally — maybe I can just turn it off for board_s3() and board_azure()? I think there's some potential for weird stuff to happen if you did (say) board %>% pin_write("a") and board %>% pin_write("a/b"), but you're probably not doing that, and the weirdness probably isn't that weird/risky.

dgrtwo commented 3 years ago

I'd definitely be happy if it's preserved in AWS boards!

To document the strangeness (which I agree is unlikely to appear in the wild) what I see in legacy pins is:

pins::pin(mtcars, "a", board = "heap")
pins::pin(diamonds, "a/", board = "heap")
pins::pin(mpg, "a/b", board = "heap")

# Recovers all three well, in the right order:
pins::pin_get("a", board = "heap")
pins::pin_get("a/", board = "heap")
pins::pin_get("a/b", board = "heap")

So writing and reading work. But when I go in the AWS, I can't find the "a" (mtcars) pin. (I don't know if it's different in the new pins approach; but either way I don't think it's a problem).

hadley commented 3 years ago

@dgrtwo I remembered one important difference with the modern S3 board — it doesn't maintain a central listing of all pins (because it's basically impossible to do so from pins without introducing the possibility of race conditions). This means that I'm currently relying on directory listings to get a list of all pins. So adding a directory hierarchy would either make pin_list() substantially slower (because I now have to spider directories) or it'll be inaccurate (which might be ok if you don't rely on pin_list()/pin_search() and instead just pass around pin names).

dgrtwo commented 3 years ago

Oh: that makes sense. And while I wouldn't mind losing that behavior in pin_list I suppose it would be really confusing for someone who creates a pin then isn't able to see it in list.

At first I thought you could use the --recursive flag to S3's list, but looks like that's only in the command line interface, not the API (weird) https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html

So I understand not supporting nested directories. I can work around it if we switch to the new pins implementation. (I do think this is going to be a common request as boards grow in complexity, though)

hadley commented 3 years ago

Yeah, I’ll keep thinking about it but not sure I’ll be able to figure it out in time for the 1.0.0 release.

hadley commented 3 years ago

@dgrtwo what if the subdirectory was a board parameter (as in #495)?

board_marketing <- board_s3(prefix = "marketing")
board_sales <- board_s3(prefix = "sales")
board_marketing_expt <- board_s3(prefix = "marketing/experimental")

hadley commented 2 years ago

I think I've decided it's better to do this namespacing at the board level, rather than at the pin level.

github-actions[bot] commented 2 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

rstudio / pins-r

Allow `/` in pin names #453