tidyverse / purrr

A functional programming toolkit for R
https://purrr.tidyverse.org/
Other
1.27k stars 274 forks source link

map as rerun replacement has weird behavior? #1118

Closed lmiratrix closed 3 months ago

lmiratrix commented 8 months ago

I believe the following code should print out "res" four times under the first map call below (it does) and also the second (it doesn't). Somehow the indices 1:reps in the map call are getting passed to the ~ one_run() syntax, and that is getting picked up by the 'data_only' variable, and I am not sure why (on my system)?


library( tidyverse )

one_run = function( one_sided = TRUE,
                    scaled_C = NULL,
                    perfect_X = FALSE, 
                    shuffle = FALSE,
                    data_only = FALSE, ...  ) {

    if ( data_only ) {
        return( paste0( "dta-", data_only ) )
    } else {
        return( "res" )
    }
}

run_sim <- function( reps, 
                     one_sided = TRUE,
                     scaled_C = NULL,
                     perfect_X = FALSE,
                     shuffle = FALSE, ... ) {
    runs <- 
        purrr::map( 1:reps, ~ one_run( one_sided = one_sided,
                                       scaled_C = scaled_C,
                                       perfect_X = perfect_X,
                                       shuffle = shuffle,
                                       ... ) ) 

    return( unlist( runs ) )
}

unlist( purrr::map( 1:4, ~ one_run() ) )

run_sim( reps = 4 )
Session info ``` R version 4.3.2 (2023-10-31) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Sonoma 14.2.1 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/New_York tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2 readr_2.1.5 [7] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.4.4 tidyverse_2.0.0 loaded via a namespace (and not attached): [1] vctrs_0.6.5 cli_3.6.2 rlang_1.1.3 stringi_1.8.3 pkgload_1.3.4 [6] generics_0.1.3 glue_1.7.0 colorspace_2.1-0 hms_1.1.3 scales_1.3.0 [11] fansi_1.0.6 grid_4.3.2 munsell_0.5.0 tzdb_0.4.0 lifecycle_1.0.4 [16] compiler_4.3.2 timechange_0.2.0 pkgconfig_2.0.3 rstudioapi_0.15.0 R6_2.5.1 [21] tidyselect_1.2.0 utf8_1.2.4 pillar_1.9.0 magrittr_2.0.3 tools_4.3.2 [26] withr_3.0.0 gtable_0.3.4 ```
hadley commented 3 months ago

Could you please provide a simpler example that illustrates the problem? (Maybe just be deleting all the arguments that argument actually used?)

lmiratrix commented 3 months ago

Ok, how about this:

library( tidyverse )

one_run = function( a_flag = FALSE, ...  ) {

    if ( a_flag ) {
        return( paste0( "dta-", a_flag ) )
    } else {
        return( "res" )
    }
}

run_sim <- function( reps, ... ) {
    purrr::map_chr( 1:reps, ~ one_run( ... ) ) 
}

purrr::map_chr( 1:4, ~ one_run() )

run_sim( reps = 4 )

run_sim( reps = 4, a_flag = FALSE )

It looks like the "..." in the one_run() is getting mapped to the anonymous function's initial argument, which is the counter from map?

hadley commented 3 months ago

I think you're running into the complexities of the ~ helper. Life gets easier if you switch to the new base R anonymous function shorthand:

library(purrr)

one_run <- function(a_flag = FALSE, ...) {
    if (a_flag ) {
      paste0("dta-", a_flag)
    } else {
      "res"
    }
}

run_sim <- function( reps, ... ) {
  purrr::map_chr(1:reps, \(i) one_run( ... )) 
}

run_sim(reps = 4, a_flag = TRUE)
#> [1] "dta-TRUE" "dta-TRUE" "dta-TRUE" "dta-TRUE"
run_sim(reps = 4, a_flag = FALSE)
#> [1] "res" "res" "res" "res"

Created on 2024-07-23 with reprex v2.1.0

lmiratrix commented 3 months ago

Ok, so the "~" method has some weirdness; I will try and update my coding habits to (i). I like the explicitness of the parameter, at least. :-)