r-lib / httr

httr: a friendly http package for R
https://httr.r-lib.org
Other
986 stars 1.99k forks source link

Bug: Executing `http::GET()` in parallel results in an error when no single core GET request was issued before. #749

Closed rkrug closed 9 months ago

rkrug commented 9 months ago

macOS Sonoma, MacBook Pro, M1 Pro chip

Trying to use parallel::mclapply() to do GET requests in parallel, results in errors on all cores.

After executing a single core request once, results in the error disappearing.

r$> library(httr)

r$> parallel::mclapply(1:2, function(x){httr::GET("http://openalex.org/")})
objc[45797]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[45797]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[45796]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[45796]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
[[1]]
NULL

[[2]]
NULL

Warning message:
In parallel::mclapply(1:2, function(x) { :
  scheduled cores 1, 2 did not deliver results, all values of the jobs will be affected

r$> httr::GET("http://openalex.org/")
Response [https://openalex.org/]
  Date: 2023-11-12 11:16
  Status: 200
  Content-Type: text/html; charset=UTF-8
  Size: 1.02 kB

r$> parallel::mclapply(1:2, function(x){httr::GET("http://openalex.org/")})
[[1]]
Response [https://openalex.org/]
  Date: 2023-11-12 11:17
  Status: 200
  Content-Type: text/html; charset=UTF-8
  Size: 1.02 kB

[[2]]
Response [https://openalex.org/]
  Date: 2023-11-12 11:17
  Status: 200
  Content-Type: text/html; charset=UTF-8
  Size: 1.02 kB

r$> sessioninfo::session_info()
─ Session info ──────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16)
 os       macOS Sonoma 14.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Zurich
 date     2023-11-12
 pandoc   3.1.9 @ /opt/homebrew/bin/pandoc

─ Packages ──────────────────────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
 curl          5.1.0   2023-10-02 [1] CRAN (R 4.3.1)
 httr        * 1.4.7   2023-08-15 [1] CRAN (R 4.3.0)
 jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
 R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
 rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)

 [1] /Users/rainerkrug/R/library/aarch64-apple-darwin20/4.3
 [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library

─────────────────────────────────────────────────────────────────────────────────────────────

r$>
rkrug commented 9 months ago

Based on among others https://community.rstudio.com/t/running-parallel-on-mac/142580/6, I have set OBJC_DISABLE_INITIALIZE_FORK_SAFETY in environ to YES:

Sys.getenv("OBJC_DISABLE_INITIALIZE_FORK_SAFETY")
[1] "YES"

But no change.

hadley commented 9 months ago

httr has been superseded by httr2, so no further development work will happen. I'd recommend giving httr2::req_perform_parallel() a go since it does parallel requests in a way that actually works (i.e. using curl's parallel request facilities).

rkrug commented 9 months ago

Thanks - I'll look into httr2. Although httr2::req_perform_parallel() is unfortunately an option, as the call ia=s in a package.

hadley commented 9 months ago

Why isn't it an option?

rkrug commented 9 months ago

It is not my package....

rkrug commented 9 months ago

And also, parallel calls can cause problems due to API restrictions of that specific api - so it needs to be handled with care.

hadley commented 9 months ago

Oh got it.