markfairbanks / tidytable

Tidy interface to 'data.table'
https://markfairbanks.github.io/tidytable/
Other
449 stars 33 forks source link

In grouped dfs, mutate fails to coerce columns with the same name to another type #715

Closed AltfunsMA closed 1 year ago

AltfunsMA commented 1 year ago

Note below that hp can be coerced to hp_chr; but doing disp = as.character(disp) data.table gets itself all tangled up.

For dates (which was how I initially picked this up), the coercion unexpectedly leads to NAs.

pacman::p_load(tidytable)

set.seed(123)

days <- sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), nrow(mtcars))

mutate(mtcars, dates = days) %>% 
  group_by(cyl) %>% 
  mutate(dates = as.character(dates),
         hp_chr = as.character(hp),
         disp = as.character(disp),
         .keep = "used") %>% 
  ungroup()
#> Warning in `[.data.table`(~.df, , `:=`(c("dates", "hp_chr", "disp"), {: Coercing
#> 'character' RHS to 'double' to match the type of the target column (column 0
#> named '').
#> Warning in `[.data.table`(~.df, , `:=`(c("dates", "hp_chr", "disp"), {: NAs
#> introduced by coercion
#> Warning in `[.data.table`(~.df, , `:=`(c("dates", "hp_chr", "disp"), {: Coercing
#> 'character' RHS to 'double' to match the type of the target column (column 0
#> named '').
<---snip other warnings here---->

#> # A tidytable: 32 x 5
#>      cyl  disp    hp dates  hp_chr
#>    <dbl> <dbl> <dbl> <date> <chr> 
#>  1     6  160    110 NA     110   
#>  2     6  160    110 NA     110   
#>  3     4  108     93 NA     93    
#>  4     6  258    110 NA     110   
#>  5     8  360    175 NA     175   
#>  6     6  225    105 NA     105   
#>  7     8  360    245 NA     245   
#>  8     4  147.    62 NA     62    
#>  9     4  141.    95 NA     95    
#> 10     6  168.   123 NA     123   
#> # ... with 22 more rows

as.character(days)
#>  [1] "1999-06-28" "1999-01-14" "1999-07-14" "1999-11-02" "1999-04-28"
#>  [6] "1999-10-26" "1999-08-17" "1999-09-01" "1999-12-31" "1999-06-02"
#> [11] "1999-03-31" "1999-04-01" "1999-09-13" "1999-07-16" "1999-12-21"
#> [16] "1999-12-14" "1999-05-17" "1999-11-24" "1999-01-26" "1999-01-07"
#> [21] "1999-12-16" "1999-09-11" "1999-07-30" "1999-03-19" "1999-03-22"
#> [26] "1999-02-12" "1999-11-28" "1999-05-23" "1999-02-01" "1999-04-19"
#> [31] "1999-09-20" "1999-11-26"
markfairbanks commented 1 year ago

This is a quirk of data.table that I can't work around unfortunately. They don't like when you do type conversions in grouped operations for whatever reason.