r-lib / httr2

Make HTTP requests and process their responses. A modern reimagining of httr.
https://httr2.r-lib.org
Other
235 stars 56 forks source link

Cannot upload more than 128 files using req_perform_parallel #487

Closed maxsutton closed 2 months ago

maxsutton commented 2 months ago

When uploading more than 128 files, req_perform_parallel fails with an error message:

Error in gzfile(file, "rb") : all 128 connections are in use

This occurs even when restricting the curl pool to only use 2 connections at once.

It seems that httr2 opens some files by itself (such as by checking utils::packageVersion("httr2"), so the real limitation is more like 124 files. I have tested the same setup below using req_perform_sequential() and it succeeds.

Code

Note: With an endpoint of example_url(), a good request fails with an HTTP 404 error, but req_perform_parallel continues. When using more than 128 files, the error occurs before the first request is initiated, so the endpoint isn't a concern.

library(httr2)

N <- 10

test_upload <- function(N, endpoint = example_url()) {
  pool <- curl::new_pool(total_con = 2, host_con = 2)

  files <- sapply(1:N, function(x) {f <- tempfile(); writeLines(as.character(x), f); f})

  reqs <- lapply(files, function(f) {request(endpoint) |> req_body_file(f)})

  parallel <- req_perform_parallel(reqs, pool = pool, on_error = "continue")
}

Good (and expected behaviour)

small <- test_upload(10)
good <- test_upload(120)

Expected output: a progress bar and no errors. small and good are lists of the status of the requests made.

Bug

bug <- test_upload(129)
Error in gzfile(file, "rb") : all 128 connections are in use
## Session info ``` ─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 4.4.0 (2024-04-24 ucrt) os Windows 10 x64 (build 19045) system x86_64, mingw32 ui RStudio language (EN) collate English_Australia.utf8 ctype English_Australia.utf8 tz Australia/Sydney date 2024-07-10 rstudio 2024.04.0+735 Chocolate Cosmos (desktop) pandoc NA ─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── package * version date (UTC) lib source cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.0) callr 3.7.6 2024-03-25 [1] CRAN (R 4.4.0) cli 3.6.2 2023-12-11 [1] CRAN (R 4.4.0) curl 5.2.1 2024-03-01 [1] CRAN (R 4.4.0) devtools 2.4.5 2022-10-11 [1] CRAN (R 4.4.0) digest 0.6.35 2024-03-11 [1] CRAN (R 4.4.0) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.0) fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0) fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0) fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0) glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0) htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0) htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.0) httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.4.0) httr2 * 1.0.1 2024-04-01 [1] CRAN (R 4.4.0) later 1.3.2 2023-12-06 [1] CRAN (R 4.4.0) lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0) magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0) memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0) mime 0.12 2021-09-28 [1] CRAN (R 4.4.0) miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0) pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0) pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.4.0) pkgload 1.3.4 2024-01-16 [1] CRAN (R 4.4.0) processx 3.8.4 2024-03-16 [1] CRAN (R 4.4.0) profvis 0.3.8 2023-05-02 [1] CRAN (R 4.4.0) promises 1.3.0 2024-04-05 [1] CRAN (R 4.4.0) ps 1.7.6 2024-01-18 [1] CRAN (R 4.4.0) purrr 1.0.2 2023-08-10 [1] CRAN (R 4.4.0) R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0) rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.4.0) Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.4.0) remotes 2.5.0 2024-03-17 [1] CRAN (R 4.4.0) rlang 1.1.3 2024-01-10 [1] CRAN (R 4.4.0) rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0) sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0) shiny 1.8.1.1 2024-04-02 [1] CRAN (R 4.4.0) stringi 1.8.4 2024-05-06 [1] CRAN (R 4.4.0) stringr 1.5.1 2023-11-14 [1] CRAN (R 4.4.0) urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.4.0) usethis 2.2.3 2024-02-19 [1] CRAN (R 4.4.0) utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0) vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0) webfakes 1.3.1 2024-04-25 [1] CRAN (R 4.4.0) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.0) ```
hadley commented 2 months ago

Slightly simpler reprex:

library(httr2)

temp <- tempfile()
writeLines("Hello, world!", temp)

# OK
reqs <- lapply(1:129, function(i) request(example_url()))
parallel <- req_perform_parallel(reqs, on_error = "continue")

# NOT OK
reqs <- lapply(1:129, function(i) request(example_url()) |> req_body_file(temp))
parallel <- req_perform_parallel(reqs, on_error = "continue")

So I'm holding on to connections somewhere in the path way that serves the file.

...

Hmmm, I bet this is it:

    con <- file(data, "rb")
    # Leaks connection if request doesn't complete

So I'll need to think of a better way to clean up the connection created here.

hadley commented 2 months ago

Maybe req_body_apply() needs to include some callback that will close the open connection? And then maybe give req_handle() some sort of done() callback which we can call from $suceed() and req_perform()?

...

Hmmm, it must be more complicated than that because I do see the correct close function getting called.

...

Ooooh, the problem is that we're opening the connection when we register the call, but we only close it when it's done.

maxsutton commented 2 months ago

Thanks @hadley! I didn't have the headspace to dig into the package -- I've only just started to use it. So I really appreciated you sharing your thought process. Immensely helpful.