tidyverse / tidyr

Tidy Messy Data
https://tidyr.tidyverse.org/
Other
1.38k stars 418 forks source link

Enhance pivot_longer/wider with names_mutate #910

Closed jmh530 closed 4 years ago

jmh530 commented 4 years ago

I have a dataset that has character dates as some (but not all) of the column names. I can use pivot_longer to pivot to the shape I want, but this leaves the dates as characters. This is ok, but inhibits some kinds of analysis. To remedy this, I can then mutate those values in a post-processing step to get them as real dates. However, I presume this requires looping through the entire dataset. So if there are T of these columns, and N rows, then it requires NT mutations, rather than just T conversions normally. Ideally, there could be an option to pivot_longer/wider that would allow one to mutate/format the names before the final output.

library(tidyr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

temp <- tibble(id = 1:5, 
               "12/31/2019" = runif(5), 
               "01/31/2020" = runif(5))

temp2 <- temp %>% 
    pivot_longer(
        cols = -id,
        names_to = "strdate",
        names_pattern = "(.*)",
        values_to = "value",
    ) %>% 
    mutate(Date = as.Date(strdate, "%m/%d/%Y"))

temp2
#> # A tibble: 10 x 4
#>       id strdate     value Date      
#>    <int> <chr>       <dbl> <date>    
#>  1     1 12/31/2019 0.863  2019-12-31
#>  2     1 01/31/2020 0.803  2020-01-31
#>  3     2 12/31/2019 0.136  2019-12-31
#>  4     2 01/31/2020 0.0246 2020-01-31
#>  5     3 12/31/2019 0.832  2019-12-31
#>  6     3 01/31/2020 0.356  2020-01-31
#>  7     4 12/31/2019 0.0934 2019-12-31
#>  8     4 01/31/2020 0.0233 2020-01-31
#>  9     5 12/31/2019 0.162  2019-12-31
#> 10     5 01/31/2020 0.569  2020-01-31

Created on 2020-03-10 by the reprex package (v0.3.0)

hadley commented 4 years ago

We may eventually consider some sort of on-the-fly parsing for this, but it's unlikely to have a noticeable performance improvement unless you have very large data.

hadley commented 4 years ago

If you wanted to, you could also do it yourself by using pivot_longer_spec