posit-dev / positron

Positron, a next-generation data science IDE
Other
1.59k stars 43 forks source link

Problem with forked (multicore) code in ark #3817

Open joshuafayallen opened 1 week ago

joshuafayallen commented 1 week ago

Positron Version:

Positron Version: 2024.06.1 (Universal) build 27 Code - OSS Version: 1.90.0 Commit: a893e5b282612ccb2200102957ac38d3c14e5196 Date: 2024-06-26T02:08:06.673Z Electron: 29.4.0 Chromium: 122.0.6261.156 Node.js: 20.9.0 V8: 12.2.281.27-electron.0 OS: Darwin arm64 23.5.0

Steps to reproduce the issue:

Hi all I am not sure if this is just an issue on my end. But I was doing some analysis and kept running into a peculiar issue. For whatever reason positron keeps giving me an error. Whereas Vscode running radian does not.

#devtools::install_github("susanathey/MCPanel")
#devtools::install_github("apoorvalal/ebal")
#devtools::install_github("apoorvalal/synthdid")

library(synthdid)

data(PENN)

check = panel_estimate(PENN, unit_id = 'country', time_id = 'year',
outcome = 'log_gdp', treatment = 'dem', mccores = 5, infmethod = 'bootstrap'

What did you expect to happen?

check
#>                   DID Synthetic Control (SC) Synthetic DID (SDID)
#> Est        0.17587873            -0.02096869         -0.008799165
#> Std. Error 0.07857483             0.05651538          0.005963059
#>            Matrix Completion
#> Est              -0.01729725
#> Std. Error        0.01561886

Created on 2024-07-02 with reprex v2.1.0

Were there any error messages in the output or Developer Tools console?

Error in bootstrap_sample(): ! 'list' object cannot be coerced to type 'double' Hide Traceback ▆

  1. └─synthdid::panel_estimate(...)
  2. └─base::mapply(...)
  3. └─synthdid (local) <fn>(dots[[1L]][[1L]], dots[[2L]][[1L]])
  4. ├─stats::vcov(e, method = infmethod, ncores = mccores, replications = reps)
  5. └─synthdid:::vcov.synthdid_estimate(...)
  6. └─synthdid:::bootstrap_se(object, replications, ncores)
  7. ├─stats::sd(bootstrap_sample(estimate, replications, ncores))
  8. │ └─stats::var(...)
  9. │ └─base::is.data.frame(x)
    1. └─synthdid:::bootstrap_sample(estimate, replications, ncores)
juliasilge commented 6 days ago

I can reproduce this problem only in Positron (the code runs fine in RStudio). Here is the full traceback:

Error in `bootstrap_sample()`:
! 'list' object cannot be coerced to type 'double'
Hide Traceback
     ▆
  1. └─synthdid::panel_estimate(...)
  2.   └─base::mapply(...)
  3.     └─synthdid (local) `<fn>`(dots[[1L]][[1L]], dots[[2L]][[1L]])
  4.       ├─stats::vcov(e, method = infmethod, ncores = mccores, replications = reps)
  5.       └─synthdid:::vcov.synthdid_estimate(...)
  6.         └─synthdid:::bootstrap_se(object, replications, ncores)
  7.           ├─stats::sd(bootstrap_sample(estimate, replications, ncores))
  8.           │ └─stats::var(...)
  9.           │   └─base::is.data.frame(x)
 10.           └─synthdid:::bootstrap_sample(estimate, replications, ncores)
DavisVaughan commented 6 days ago

I see this when debugging it, there's a call to mcreplicate, which uses mclapply. Not entirely sure what's going on with the 5th core not returning a value

Screenshot 2024-07-03 at 3 17 22 PM
DavisVaughan commented 6 days ago

Minimal reprex

fn <- function(i) {
  if (i == 3) {
    cat("hiya")
  }
  i
}

parallel::mclapply(
  X = 1:10,
  FUN = fn,
  mc.cores = 5
)

Something about cat() while on the multicore works freaks the worker out, and I think it crashes? Note that I only call cat() on iteration 3 but it forces iteration 8 (same worker) to also not return a value.

> fn <- function(i) {
-   if (i == 3) {
-     cat("hiya")
-   }
-   i
- }
- 
- parallel::mclapply(
-   X = 1:10,
-   FUN = fn,
-   mc.cores = 5
- )
Warning message:
In parallel::mclapply(X = 1:10, FUN = fn, mc.cores = 5) :
  scheduled core 3 did not deliver a result, all values of the job will be affected
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
NULL

[[4]]
[1] 4

[[5]]
[1] 5

[[6]]
[1] 6

[[7]]
[1] 7

[[8]]
NULL

[[9]]
[1] 9

[[10]]
[1] 10
joshuafayallen commented 6 days ago

This may be anticipated but when I render a reprex it behaves as expected

fn <- function(i) {
  if (i == 3) {
    cat("hiya")
  }
  i
}

parallel::mclapply(
  X = 1:10,
  FUN = fn,
  mc.cores = 5
)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 3
#> 
#> [[4]]
#> [1] 4
#> 
#> [[5]]
#> [1] 5
#> 
#> [[6]]
#> [1] 6
#> 
#> [[7]]
#> [1] 7
#> 
#> [[8]]
#> [1] 8
#> 
#> [[9]]
#> [1] 9
#> 
#> [[10]]
#> [1] 10

Created on 2024-07-03 with reprex v2.1.0

Wheareas outside of a reprex I get the same error as Davis

DavisVaughan commented 6 days ago

I think it is going to be very unsafe to use forked (multicore) code in ark, at least until we can take a closer look at this and see if it is even possible to do this safely. It is highly likely that forking is going to interfere with the way that we communicate to our frontends through Jupyter comms. The cat() call writes to stdout() and it seems like that isn't working correctly, though I'm not quite sure why yet.

It should be safe to use multi-process parallelism, i.e. PSOCK clusters created with parallel::makePSOCKcluster() or future::multisession()


R docs in ?mcfork even say

It is strongly discouraged to use mcfork and the higher-level functions which rely on it (e.g., mcparallel, mclapply and pvec) in GUI or embedded environments, because it leads to several processes sharing the same GUI which will likely cause chaos (and possibly crashes)

And I know RStudio has been fighting issues like this for a long time

DavisVaughan commented 6 days ago

@joshuafayallen if you control the call to mcreplicate and set refresh = FALSE to avoid the cat() call, it seems to work for this specific case, but I can't make any promises about everything else also working right.

# doesnt work
mcreplicate::mc_replicate(
  n = 5,
  expr = 1 + 1,
  mc.cores = 5
)

# works
mcreplicate::mc_replicate(
  n = 5,
  expr = 1 + 1,
  mc.cores = 5,
  refresh = FALSE
)
joshuafayallen commented 6 days ago

@DavisVaughan Thank you so much!

jennybc commented 6 days ago

This may be anticipated but when I render a reprex ...

Yeah reprex takes this off to an entirely separate process, so it's a totally different exercise, even when the reprex-ing happens in Positron.

joshuafayallen commented 6 days ago

@jennybc Yeah that makes sense. After spending a few hours when I went to render the reprex I thought I was losing my mind

kevinushey commented 4 days ago

Is there any hope of supporting this in ark via things like https://docs.rs/libc/latest/libc/fn.pthread_atfork.html to close the relevant communication channels in forked sub-processes? RStudio does something somewhat similar; we try to keep track of which process is the "main" process after a fork occurs, and let "child" processes continue execution but forbid certain things that only the main process should do.

lionel- commented 4 days ago

Or use the atfork hook to fail/crash with a proper error message? I'd be tempted to only support truly cross-platform and well specified approaches in ark. R can always be invoked in a separate process if fork-parallelism is needed.