tidyverse / purrr

A functional programming toolkit for R
https://purrr.tidyverse.org/
Other
1.27k stars 274 forks source link

map-family: Allow explicit specification of attributes when using named functions in f. #1063

Closed werkstattcodes closed 1 year ago

werkstattcodes commented 1 year ago

This is a feature request.

The documentation of purrr 1.01. states that formula specification of .f are "only recommended if you require backward compatibility with older versions of R." I read this as an advice not to use it in other cases.

One of the advantages of the formula approach was the clear link between the input e.g. in map2 .x and .y and where they were fed into .f (as .x. and .y). As far as I can tell, this is currently not possible when using in .f a named function (other than with the formula).

I hope my reprex below makes this sufficiently clear.

library(tidyverse)

my_df <- tibble(col_num=1:3, col_a=rep("a", 3))

#named function
fn_comb <- function(h, i) {
  paste(h, i, sep="-")
}

#named function works, but without actually specifying where .x and .y are fed into.
my_df %>% 
  mutate(comb=map2_chr(.x=col_num, .y=col_a, .f=fn_comb))
#> # A tibble: 3 × 3
#>   col_num col_a comb 
#>     <int> <chr> <chr>
#> 1       1 a     1-a  
#> 2       2 a     2-a  
#> 3       3 a     3-a

#formula approach: I could clearly state where .x and .y went into.
my_df %>% 
  mutate(comb=map2_chr(.x=col_num, .y=col_a, .f=~fn_comb(h=.x, i=.y)))
#> # A tibble: 3 × 3
#>   col_num col_a comb 
#>     <int> <chr> <chr>
#> 1       1 a     1-a  
#> 2       2 a     2-a  
#> 3       3 a     3-a

#named function with argument specification => fails;  the example below seems to me to be the most intuitive. My feature request would be to enable such an approach.
my_df %>% 
  mutate(comb=map2_chr(.x=col_num, .y=col_a, .f=fn_comb(h=.x, i=.y)))
#> Error in `mutate()`:
#> ℹ In argument: `comb = map2_chr(.x = col_num, .y = col_a, .f = fn_comb(h
#>   = .x, i = .y))`.
#> Caused by error in `paste()`:
#> ! object '.x' not found

#> Backtrace:
#>      ▆
#>   1. ├─my_df %>% ...
#>   2. ├─dplyr::mutate(...)
#>   3. ├─dplyr:::mutate.data.frame(...)
#>   4. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
#>   5. │   ├─base::withCallingHandlers(...)
#>   6. │   └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
#>   7. │     └─mask$eval_all_mutate(quo)
#>   8. │       └─dplyr (local) eval()
#>   9. ├─purrr::map2_chr(...)
#>  10. │ └─purrr:::map2_("character", .x, .y, .f, ..., .progress = .progress)
#>  11. │   └─purrr::as_mapper(.f, ...)
#>  12. ├─global fn_comb(h = .x, i = .y)
#>  13. │ └─base::paste(h, i, sep = "-")
#>  14. └─base::.handleSimpleError(...)
#>  15.   └─dplyr (local) h(simpleError(msg, call))
#>  16.     └─rlang::abort(message, class = error_class, parent = parent, call = error_call)

Created on 2023-03-17 with reprex v2.0.2


I am aware that I could wrap the named function with an anonymous function, but this would be only cosmetic and potentially deceptive since there is no actual link between the name of the arguments of the anonymous function and the name of the input into the map function (vectors, lists). Swapping arguments leads to the same result.

```r
my_df %>% 
  mutate(comb=map2_chr(.x=col_num, .y=col_a, .f=\(.y, .x) fn_comb(h=.y, i=.x)))
#> # A tibble: 3 × 3
#>   col_num col_a comb 
#>     <int> <chr> <chr>
#> 1       1 a     1-a  
#> 2       2 a     2-a  
#> 3       3 a     3-a

I love purrr, but the "issue" laid out above is constantly itching me. Many thanks.

francisbarton commented 1 year ago

I'm not sure if this answers your question, but I take the gist of it to be in relation to the "clarity" of the link between the function and its arguments. I suppose with a named function, the "explicit specification of arguments" lives within the function definition. I don't really see how this is inferior to the formula style in your examples. Personally I'm finding the new anonymous function format really helpful in terms of being clear what is going into the function and where, but YMMV.

Secondly i think you might be over-complicating your use of map* in your examples (though maybe that was just for the sake of providing examples). Here's how I would do what you are doing:

my_df <- tibble::tibble(col_num = 1:3, col_a = rep("a", 3))

fn_comb <- function(h, i) paste(h, i, sep = "-")

# simple mutate (no map)
my_df |>
  dplyr::mutate(comb = fn_comb(col_num, col_a))
#> # A tibble: 3 × 3
#>   col_num col_a comb 
#>     <int> <chr> <chr>
#> 1       1 a     1-a  
#> 2       2 a     2-a  
#> 3       3 a     3-a

# use named function with map
my_df |>
  dplyr::mutate(comb = purrr::map2_chr(col_num, col_a, fn_comb))
#> # A tibble: 3 × 3
#>   col_num col_a comb 
#>     <int> <chr> <chr>
#> 1       1 a     1-a  
#> 2       2 a     2-a  
#> 3       3 a     3-a

Created on 2023-05-02 with reprex v2.0.2

I'm not sure I understand the importance to you of seeing the names of the function arguments exactly. if you have a fn with two arguments and you feed it what it needs, do you need to be explicit about using .x and .y?

hadley commented 1 year ago

Thanks @francisbarton!