lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
377 stars 59 forks source link

`nthreads = 1` not working with clustered SEs #379

Closed arcruz0 closed 8 months ago

arcruz0 commented 1 year ago

Thanks for the great package!

I am running a routine in parallel, which includes fixest models with clustered SEs at each thread. I thought I'd use nthreads = 1 to ensure that each thread would run independently, but this does not have good performance. I noticed that using fixest::setFixest_nthreads(1) instead of nthreads = 1 does achieve the expected performance. See the MRE below, which borrows an example from the vignette.

library(fixest)
library(parallel)

my_threads <- 8
n_its <- 1024

system.time({
  results <- mclapply(1:n_its, function(x){
    setFixest_nthreads(1)
    fepois(Euros ~ log(dist_km), trade, vcov = ~Product)
  }, mc.cores = my_threads)
})
#>    user  system elapsed 
#>  15.314   3.070   7.869

system.time({
  results <- mclapply(1:n_its, function(x){
    fepois(Euros ~ log(dist_km), trade, vcov = ~Product, nthreads = 1)
  }, mc.cores = my_threads)
})
#>    user  system elapsed 
#> 181.498   6.317  18.958

I've only encountered this problem with clustered SEs. See below for an example without them, in which the two approaches are comparable.

system.time({
  results <- mclapply(1:n_its, function(x){
    setFixest_nthreads(1)
    fepois(Euros ~ log(dist_km), trade)
  }, mc.cores = my_threads)
})
#>    user  system elapsed 
#>  15.594   7.858   6.163

system.time({
  results <- mclapply(1:n_its, function(x){
    fepois(Euros ~ log(dist_km), trade, nthreads = 1)
  }, mc.cores = my_threads)
})
#>    user  system elapsed 
#>  19.816  10.987   6.904

The issue seems to get worse with more threads and more iterations. I can also reproduce it when using the future packages instead of parallel. Is it possible that, when using clustered SEs, some operation is still running in parallel even after setting nthreads = 1?

Apologies if I am misunderstanding anything here, and many thanks in advance!

Created on 2023-01-20 with reprex v2.0.2

Session info ``` r sessionInfo() #> R version 4.2.2 Patched (2022-11-10 r83330) #> Platform: x86_64-pc-linux-gnu (64-bit) #> Running under: Pop!_OS 22.04 LTS #> #> Matrix products: default #> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 #> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0 #> #> locale: #> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C #> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 #> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 #> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C #> [9] LC_ADDRESS=C LC_TELEPHONE=C #> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C #> #> attached base packages: #> [1] parallel stats graphics grDevices utils datasets methods #> [8] base #> #> other attached packages: #> [1] fixest_0.11.1 #> #> loaded via a namespace (and not attached): #> [1] Rcpp_1.0.9 Formula_1.2-4 rstudioapi_0.14 #> [4] knitr_1.41 magrittr_2.0.3 lattice_0.20-45 #> [7] R.cache_0.16.0 rlang_1.0.6 fastmap_1.1.0 #> [10] stringr_1.5.0 styler_1.9.0 highr_0.10 #> [13] tools_4.2.2 grid_4.2.2 nlme_3.1-161 #> [16] xfun_0.36 R.oo_1.25.0 cli_3.6.0 #> [19] withr_2.5.0 htmltools_0.5.4 yaml_2.3.6 #> [22] digest_0.6.31 lifecycle_1.0.3 numDeriv_2016.8-1.1 #> [25] purrr_1.0.1 vctrs_0.5.1 R.utils_2.12.2 #> [28] fs_1.5.2 glue_1.6.2 evaluate_0.20 #> [31] rmarkdown_2.20 sandwich_3.0-2 reprex_2.0.2 #> [34] stringi_1.7.12 dreamerr_1.2.3 compiler_4.2.2 #> [37] R.methodsS3_1.8.2 zoo_1.8-11 ```
lrberge commented 1 year ago

Hi and thanks a lot for the detailed issue! It's so clear I could precisely locate the bug just by reading it. That's awesome! :-) In sum: you got this right. I'll fix it in the coming weeks.

lrberge commented 8 months ago

OMG it's been a year.... so sorry for the delay. Now fixed.

https://github.com/lrberge/fixest/commit/8900a004ff783d4c705ec686c8940452cf00ad27