mrc-ide / orderly2

https://mrc-ide.github.io/orderly2/
Other
7 stars 2 forks source link

orderly_location_pull_packet() doesn't require location= specification #154

Open jeffeaton opened 2 months ago

jeffeaton commented 2 months ago

Collaborators were surprised and slightly confused today that orderly_location_pull_packet() did not require specification of a location from which pulling.

That also gave rise to a similar question of what happens if the same packet exists at multiple locations—which is it pulling from?

Is the answer that it doesn't matter because the content must be the same? Will it default to pulling from whichever location appears first in orderly_location_list()?

I notice in the message below that message specifies it came from root1:

> orderly_location_pull_packet(name = "hello")
ℹ Looking for suitable files already on disk
ℹ Need to fetch 2 files (112 B) from 1 location
✔ Fetched 2 files (112 B) from 'root1' in 41ms.

Here's a little example I used to explore the behaviour for myself:

library(orderly2)

root1 <- tempfile()
root2 <- tempfile()
root3 <- tempfile()

## Create a report at root1
orderly_init(root1)

setwd(root1)
orderly_new("hello")

writeLines('
library(orderly2)
orderly_artefact("hello", "hello.txt")
writeLines("hello world!", "hello.txt")
',
"src/hello/hello.R"
)

id <- orderly_run("hello")

## Pull packet to root2
orderly_init(root2)
setwd(root2)

orderly_location_add("root1", "path", list(path = root1))
orderly_location_pull_packet(name = "hello", options = list(pull_metadata = TRUE))

## Create root3; add both root1 and root2 as locations
orderly_init(root3)
setwd(root3)

orderly_location_add("root1", "path", list(path = root1))
orderly_location_add("root2", "path", list(path = root2))

orderly_metadata_extract(options = list(allow_remote = TRUE, pull_metadata = TRUE))

orderly_metadata_extract(options = list(allow_remote = TRUE, location = "root1"))
orderly_metadata_extract(options = list(allow_remote = TRUE, location = "root2"))

orderly_location_pull_packet(name = "hello")
richfitz commented 2 months ago

See https://mrc-ide.github.io/orderly2/reference/orderly_search_options.html for details - this is behaving as expected and as designed, and is perhaps something that you will also have to unlearn from orderly1.

You have a series of remotes - most users will have zero or one. When pulling the default is to pull from all remotes; that is the query evaluated over the graph of packets contained everywhere. You can restrict this by using the location argument to the options there, as you have seen.

The message

✔ Fetched 2 files (112 B) from 'root1' in 41ms.

indicates that the files that satisfy that search are found at root1 but does not mean that other roots could not have also satisfied them.

jeffeaton commented 2 months ago

Thanks very much. Yeah, just to clarify, no concern here that there was unexpected behaviour, just lack of clarity from the docs and messages from orderly_location_pull_packet() about what the behaviour was.

I think this could be more quickly clear for user for the documentation for orderly_location_pull_packet() to (1) reiterate what the behaviour for searching is across locations, (2) enumerate what the options are, (3) refer user to docs for orderly_search_options().

Personally, I would also favor the location =, allow_remote =, and pull_metadata = to be named args to remind/prompt me (we've discussed the balance on this elsewhere).

Are the list of options available for the options = ... argument in orderly_location_pull_packet() exactly those defined in ?orderly_search_options? That wasn't clear to me from the docs:

 options: Options passed to orderly_search. The option ‘allow_remote’
          must be ‘TRUE’ as otherwise no packet could possibly be
          pulled, so an error is thrown if this is FALSE.

On tracing through now I see that directs me to ?orderly_search which then directs to ?orderly_search_options; but that's a fair bit of steps to figure out what the options = can do if I'm not quite sure what I'm looking for.