Open wlandau opened 2 years ago
It does not. We did look into it a while back and it is kind of tricky in R because you'd need to let curl run in another process behind something like future
, then return a future to the user instead of the result. We never got anything like a working example but I think it's probably technically possible.
One alternative might be to run Paws itself in another process using future.
There is also an open PR for downloading files from S3 directly to disk which I suppose would help when running it in another process.
Hi all,
I have been thinking about this. I think it is a possibly a limitation of the current httr
package, as it doesn't call the curl
async processes i.e. multi_add
, multi_run
. The key process we would want to use is curl::curl_fetch_multi
. To get it we could extend the current httr
package.
First step extend the httr
package to include curl::curl_fetch_multi
library(httr)
# Create new multi
write_multi_disk <- function(path, overwrite = FALSE) {
if (!overwrite && file.exists(path)) {
stop("Path exists and overwrite is FALSE", call. = FALSE)
}
httr:::request(output = write_function("write_multi_disk", path = path, file = NULL))
}
# add method to call curl::curl_fetch_mulit
request_fetch <- function(x, url, handle) UseMethod("request_fetch")
request_fetch.write_multi_disk <- function(x, url, handle) {
con <- file(x$path)
curl::curl_fetch_multi(
url, fail = failure, data = con, handle = handle
)
tryCatch({
curl::multi_run()
}, interrupt = function(cnd) {
curl::multi_cancel(handle)
})
resp <- curl::handle_data(handle)
resp$content <- httr:::path(x$path)
resp
}
# TODO: better failure function to align with paws error handling
failure <- function(msg){
stop(msg)
}
# Testing new method
r <- httr::VERB(
"GET",
url = "https://www.google.com",
config = write_multi_disk("temp.txt",T)
)
httr::headers(r)
httr::status_code(r)
httr::content(r, as = "raw")
The big issue I see with this is the error handling, however this method could be added/developed long side the current PR #458.
Let me know your thoughts around this @wlandau @davidkretch 😄
@davidkretch, thanks for confirming. I thought that might be the case. @DyfanJones, that's a great point. Seems like async would belong in a package like httr. Looks like async is discussed a bit at (https://github.com/r-lib/httr2/issues/1.
Been thinking about this and I think we can get async s3 downloads using the promises
similar to what @davidkretch mentioned here:
One alternative might be to run Paws itself in another process using future.
Here is a basic example
library(paws)
library(promises)
future::plan(future::multisession)
s3 = paws::s3()
s3_async_download = function(Bucket, Key, Filename, svc) {
then({
future_promise(svc$download_file(
Bucket = Bucket,
Key = Key,
Filename = Filename
), seed = T)
}, onRejected = function(){
stop(sprintf("Failed to download s3://%s/%s", Bucket, Key))
})
}
system.time({
s3$download_file(
Bucket = "dummy",
Key = "myfile.csv",
Filename = "myfile1.csv"
)
})
#> user system elapsed
#> 0.873 1.348 33.800
system.time({
s3_async_download(
Bucket = "dummy",
Key = "myfile.csv",
Filename = "myfile2.csv"
svc = s3
)
})
#> user system elapsed
#> 0.063 0.005 0.091
Created on 2022-04-20 by the reprex package (v2.0.1)
Seems to be really promising 😉
Does
paws
have a way to send API requests asynchronously, particularly for uploading and downloading to/from S3? I have heardcurl
has async built in.