markfairbanks / tidytable

Tidy interface to 'data.table'
https://markfairbanks.github.io/tidytable/
Other
449 stars 33 forks source link

`mutate(row_number())` fails on a 0 row data frame when overwriting existing column #639

Closed jfdesomzee closed 1 year ago

jfdesomzee commented 1 year ago

Hello,

I have an error when I create a variable with an existing name in a data table with 0 rows. Any idea how I could make this works?

require(magrittr)
#> Loading required package: magrittr
require(tidytable)
#> Loading required package: tidytable
#> As of tidytable v0.9.0 dotless versions of functions are exported.
#> You can now use `arrange()`/`mutate()`/etc. directly.
#> 
#> Attaching package: 'tidytable'
#> The following object is masked from 'package:magrittr':
#> 
#>     extract
#> The following objects are masked from 'package:stats':
#> 
#>     dt, filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     %in%, ifelse

iris %>%
  tidytable::as_tidytable() %>%
  tidytable::filter(FALSE) %>% 
  tidytable::mutate.(.df=.,
                     Sepal.Length2=1:.N)
#> # A tidytable: 0 x 6
#> # ... with 6 variables: Sepal.Length <dbl>, Sepal.Width <dbl>,
#> #   Petal.Length <dbl>, Petal.Width <dbl>, Species <fct>, Sepal.Length2 <list>

iris %>%
  tidytable::as_tidytable() %>%
  tidytable::filter(1:.N<5) %>% 
  tidytable::mutate.(.df=.,
                     Sepal.Length=1:.N)
#> # A tidytable: 4 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <int>       <dbl>        <dbl>       <dbl> <fct>  
#> 1            1         3.5          1.4         0.2 setosa 
#> 2            2         3            1.4         0.2 setosa 
#> 3            3         3.2          1.3         0.2 setosa 
#> 4            4         3.1          1.5         0.2 setosa

iris %>%
  tidytable::as_tidytable() %>%
  tidytable::filter(FALSE) %>% 
  tidytable::mutate.(.df=.,
                     Sepal.Length=1:.N)
#> Error:
#> ! Can't recycle input of size 2 to size 0.

#> Backtrace:
#>      x
#>   1. +-... %>% tidytable::mutate.(.df = ., Sepal.Length = 1:.N)
#>   2. +-tidytable::mutate.(.df = ., Sepal.Length = 1:.N)
#>   3. +-tidytable:::mutate..data.frame(.df = ., Sepal.Length = 1:.N)
#>   4. | +-tidytable::mutate(...)
#>   5. | \-tidytable:::mutate.tidytable(...)
#>   6. |   \-rlang::eval_tidy(dt_expr, .df, dt_env)
#>   7. +-.df[, `:=`(Sepal.Length = vctrs::vec_recycle(1:.N, .N))]
#>   8. +-data.table:::`[.data.table`(...)
#>   9. | \-base::eval(jsub, SDenv, parent.frame())
#>  10. |   \-base::eval(jsub, SDenv, parent.frame())
#>  11. |     \-vctrs::vec_recycle(1:.N, .N)
#>  12. \-vctrs:::stop_recycle_incompatible_size(...)
#>  13.   \-vctrs:::stop_vctrs(...)
#>  14.     \-rlang::abort(message, class = c(class, "vctrs_error"), ..., call = vctrs_error_call(call))
markfairbanks commented 1 year ago

I don't know if this is a bug - this errors in dplyr as well:

library(dplyr, warn.conflicts = FALSE)

df <- tibble(x = integer(), y = character())

df %>%
  dplyr::mutate(x = 1:n())
#> Error in `dplyr::mutate()`:
#> ℹ In argument: `x = 1:n()`.
#> Caused by error:
#> ! `x` must be size 0 or 1, not 2.

#> Backtrace:
#>      ▆
#>   1. ├─df %>% dplyr::mutate(x = 1:n())
#>   2. ├─dplyr::mutate(., x = 1:n())
#>   3. ├─dplyr:::mutate.data.frame(., x = 1:n())
#>   4. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...))
#>   5. │   ├─base::withCallingHandlers(...)
#>   6. │   └─dplyr:::mutate_col(dots[[i]], .data, mask, new_columns)
#>   7. │     └─mask$eval_all_mutate(quo)
#>   8. │       └─dplyr (local) eval()
#>   9. ├─dplyr:::dplyr_internal_error(...)
#>  10. │ └─rlang::abort(class = c(class, "dplyr:::internal_error"), dplyr_error_data = data)
#>  11. │   └─rlang:::signal_abort(cnd, .file)
#>  12. │     └─base::signalCondition(cnd)
#>  13. └─dplyr (local) `<fn>`(`<dpl:::__>`)
#>  14.   └─rlang::abort(message, class = error_class, parent = parent, call = error_call)
markfairbanks commented 1 year ago

I'm going to close this - I don't think this is something that should be changed if it also fails in dplyr.

If you have any other questions around it let me know.

markfairbanks commented 1 year ago

I think the recommended way to deal with this would be to pass a vector of length 0 to mutate():

library(tidytable, warn.conflicts = FALSE)
#> As of tidytable v0.9.0 dotless versions of functions are exported.
#> You can now use `arrange()`/`mutate()`/etc. directly.

iris %>%
  as_tidytable() %>%
  filter(FALSE) %>%
  mutate(Sepal.Length=integer())
#> # A tidytable: 0 × 5
#> # … with 5 variables: Sepal.Length <int>, Sepal.Width <dbl>,
#> #   Petal.Length <dbl>, Petal.Width <dbl>, Species <fct>
markfairbanks commented 1 year ago

Also FYI - you don't need to use require(magrittr) if you're only using it for the %>% pipe. tidytable reexports %>%.

jfdesomzee commented 1 year ago

this does not fail

library(dplyr, warn.conflicts = FALSE)

df <- tibble(x = integer(), y = character())

df %>%
  dplyr::mutate(x = row_number())
#> # A tibble: 0 x 2
#> # ... with 2 variables: x <int>, y <chr>
markfairbanks commented 1 year ago

Ah gotcha - 1:n() fails because it's essentially calling 1:0, which is a vector of length 2. Whereas dplyr's row_number() is doing something different in the background.

In tidytable row_number() does basically call 1:n().

markfairbanks commented 1 year ago

All set.

pacman::p_load(tidytable)

df <- tidytable(x = integer(), y = character())

df %>%
  mutate(x = row_number())
#> # A tidytable: 0 × 2
#> # … with 2 variables: x <int>, y <chr>