Closed psimm closed 2 years ago
Thanks for the question @psimm
@jeroen Any advice on this? Do you think it's a good idea to let users set host_con
to any integer value?
It's usually not a good idea to have too many connections to a single host, because it can be considered abusive, and even get you banned. Also it can degrade performance, because of overhead in starting a new connections rather than using an existing one. Most browsers limit to 6 concurrent connections per host. So even if a webpage has 100 images, it starts 6 connections to download all those images.
Also note that when the server supports HTTP/2 we can use multiplexing, so we can basically do many parallel http requests over a single connection. So in this case it is not needed at all to have more than one connection to a host.
Thanks @jeroen !
Based on Jeroen's feedback I think it's best to leave host_con
to the default of 6. Thoughts @psimm ?
Thanks for the reply @sckott and thanks to @jeroen for the explanation!
My use case is the same as the one described in this issue: https://github.com/ropensci/crul/issues/47 and the question on StackOverflow it is based on: https://stackoverflow.com/questions/45573770/how-do-i-perform-parallel-asynchronous-post-api-calls-in-r/45575252.
Currently, AsyncVaried lets me send 6 requests to my API endpoint at a time, but I'd like to send many more concurrently (up to 1000). Is host_con
the wrong place to adjust? Should I look into multiplexing or should I run AsyncVaried in parallel R processes? Any advice would be much appreciated.
I'd like to send many more concurrently (up to 1000)
@jeroen if you can help here that'd be great, it's beyond my knowledge.
If a server supports HTTP/2 is multiplexing automatically used?
http/2 multiplex is automatically used, if both the client and the server support http/2. R currently supports http/2 on all platforms except windows, you can see this with curl::curl_version()$http2
.
@psimm yes if you're really sure you want to fire 1000 parallel requests at the server, and the server doesn't support http/2, then yes, host_con
would be the place to do that. But I certainly wouldn't make that the default behavior.
Btw, if you're not specifically tied to crul
it may be easier to accomplish with httr2
, see: https://httr2.r-lib.org/reference/multi_req_perform.html
@jeroen Fantastic, thanks so much for the detailed reply. I'll check if the server supports http/2. I didn't know about httr2
and will give it a try.
@sckott This issue can be closed from my point of view, unless you changed your mind about letting users optionally set host_con
. Given @jeroen's advice, it is rarely a good idea. Thanks for your help!
Okay, closing issue
Is it possible to set the number of requests to make in parallel when using AsyncVaried?
In
R/asyncvaried.R
, line 227,crulpool <- curl::new_pool()
is defined.new_pool
has an argumenthost_con
with a default of 6. Would this line have to be modified to allow making more than 6 requests to a single host in parallel?Session Info
```r R version 4.1.0 (2021-05-18) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS 12.3 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] crul_1.2.0 loaded via a namespace (and not attached): [1] compiler_4.1.0 R6_2.5.1 tools_4.1.0 httpcode_0.3.0 curl_4.3.2 [6] renv_0.14.0 ```