tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.77k stars 2.12k forks source link

`cur_column()` not working in `case_when()` LHS #6984

Closed shirdekel closed 3 months ago

shirdekel commented 9 months ago

I want to recode certain columns based on values of columns that have a similar naming. I'm using 1. case_when() to recode, 2. cur_column() to programmatically select the "related" columns, and 3. across() to automatically do this across the relevant columns.

As shown below, I can do this when specifying one column, but not with cur_column(). It complains that it must be used within across(), even though it is being used within across()

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)

names_list <-
  c("x", "y", "color")

## works

starwars |>
  rename(color = hair_color) |>
  mutate(across(
    skin_color,
    \(x)
    case_when(
      x == "fair" ~ x,
      is.na(!!sym(na.omit(str_extract("skin_color", names_list)))) ~ NA,
      .default = "other"
    )
  )) |>
  select(color, skin_color)
#> # A tibble: 87 × 2
#>    color         skin_color
#>    <chr>         <chr>     
#>  1 blond         fair      
#>  2 <NA>          <NA>      
#>  3 <NA>          <NA>      
#>  4 none          other     
#>  5 brown         other     
#>  6 brown, grey   other     
#>  7 brown         other     
#>  8 <NA>          <NA>      
#>  9 black         other     
#> 10 auburn, white fair      
#> # ℹ 77 more rows

## doesn't work

starwars |>
  rename(color = hair_color) |>
  mutate(across(
    skin_color,
    \(x)
    case_when(
      x == "fair" ~ x,
      is.na(!!sym(na.omit(str_extract(cur_column(), names_list)))) ~ NA,
      .default = "other"
    )
  )) |>
  select(color, skin_color)
#> Error in `cur_column()`:
#> ! Must only be used inside `across()`.

Created on 2024-01-15 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.2 (2023-10-31) #> os macOS Sonoma 14.2.1 #> system aarch64, darwin23.0.0 #> ui unknown #> language (EN) #> collate en_AU.UTF-8 #> ctype en_AU.UTF-8 #> tz Australia/Sydney #> date 2024-01-15 #> pandoc 2.19.2 @ /opt/homebrew/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.2) #> dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.2) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.2) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.2) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.2) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.2) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.2) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.2) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.2) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.2) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.2) #> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.2) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.2) #> reprex 2.1.0 2024-01-11 [1] CRAN (R 4.3.2) #> rlang 1.1.2 2023-11-04 [1] CRAN (R 4.3.2) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.2) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.2) #> stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.2) #> stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.2) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.2) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.2) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.2) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.2) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2) #> withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.2) #> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.2) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.2) #> #> [1] /opt/homebrew/Cellar/r/4.3.2/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
nirguk commented 9 months ago

I think you can achieve your goal with use of pick() and pull()


(starwars |>
  rename(color = hair_color) |>
  mutate(across(
    skin_color,
    \(x)  case_when(
      x == "fair" ~ x,
      is.na(pull(pick(na.omit(str_extract(cur_column(), names_list))))) ~ NA,
      .default = "other"
    ) )) |>
  select(color, skin_color)
DavisVaughan commented 3 months ago

The !! forces cur_column() to be evaluated immediately, even before the mutate() call. This is a known limitation that we don't have a great solution for right now, but this feels like a somewhat niche case and what @nirguk suggests seems like it would be a bit cleaner anyways