markfairbanks / tidytable

Tidy interface to 'data.table'
https://markfairbanks.github.io/tidytable/
Other
450 stars 32 forks source link

Tidytable %in% is not backwards-compatible with base::`%in%` #632

Closed krterberg closed 2 years ago

krterberg commented 2 years ago

Hey, First of all thanks for the fantastic package.

After the most recent update we ran into this:

1 %in% "1"
#> [1] TRUE
library(tidytable)
1 %in% "1"
#> Error:
#> ! Can't combine <double> and <character>.

#> Backtrace:
#>     ▆
#>  1. └─1 %in% "1"
#>  2.   └─vctrs::vec_in(x, y)
#>  3.     └─vctrs (local) `<fn>`()
#>  4.       └─vctrs::vec_default_ptype2(...)
#>  5.         └─vctrs::stop_incompatible_type(...)
#>  6.           └─vctrs:::stop_incompatible(...)
#>  7.             └─vctrs:::stop_vctrs(...)
#>  8.               └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = vctrs_error_call(call))

Created on 2022-09-26 with reprex v2.0.2

I understand that exporting a different version of %in% was your intention, but I have a few reservations:

  1. %in% normally dispatches to base::match(), rather than having behavior of its own.
  2. I don't usually expect to get vctrs errors or rlang backtraces from code outside of the scope of a tidyverse library.
  3. The replacement is not compatible with the base function, by dispatching to vctrs::vec_in() it becomes narrower in what combinations of types it accepts, even though those types previously need not have produced an error.

My suggestion would be to have an internal version of %in% for use inside tidytable verbs, and dispatch to that using the expression rewriting rules found in prep_expr_call().

I'll attach a pull request shorty.

I'd love to hear your thoughts and whether other users feel this is a problem worth adressing or not.

markfairbanks commented 2 years ago

My suggestion would be to have an internal version of %in% for use inside tidytable verbs I don't usually expect to get vctrs errors or rlang backtraces from code outside of the scope of a tidyverse library.

Hmm but wouldn't this be more confusing to get an error? Currently tidytable overwrites %in%, and that change is visible to the user. Let's say it's moved into prep_expr_call() and the user gets an error - wouldn't they be more confused because they wouldn't know tidytable makes that change internally?

Not trying to say you're wrong by the way - I see your point. But I feel like the internal translation would lead to more confusion for users.

To me it seems like these are the options: 1) Keep the current behavior 2) Drop tidytable::'%in%' and replace it with another option (like %fin% or %in.%) with no internal translation

markfairbanks commented 2 years ago

Actually I think the option I like most:

  1. Create special branches for this case

Edit: This might be a bad idea though - I'm not positive how many branches would be needed.

krterberg commented 2 years ago

All valid options. I didn't go with the special branch solution because I'm not sure how to detect whether the call is made inside of a tidytable verb. If you have any pointers I'd be happy to try and implement it.

markfairbanks commented 2 years ago

What do you think of this?

Basically - run a check to see if vctrs::vec_in() will cause an error. If it will it branches to use base::'%in%'.

'%in%' <- function(x, y) {
  if (is.character(x) && is.character(y)) {
    x %chin% y
  } else if (needs_base_in(x, y)) {
    base::'%in%'(x, y)
  } else {
    vec_in(x, y)
  }
}

needs_base_in <- function(x, y) {
  # https://github.com/markfairbanks/tidytable/issues/565
  # https://github.com/markfairbanks/tidytable/issues/632
  tryCatch({vec_ptype_common(x, y); FALSE}, error = function(e) TRUE)
}
krterberg commented 2 years ago

Oh very clever! Probably warrants a note in the docs in case someone unexpectedly encounters the performance degradation but otherwise seems like an elegant solution.

Thinking about it a bit more broadly, I do think you'd be accepting a risk that any bugs, regressions or differences in implementation between vec_in and base::%in% could cause users to come complaining, because tidytable chooses to apply this mask. Vctrs is pretty high quality software so this doesn't need to be too much of a concern though.

markfairbanks commented 2 years ago

Great, sounds good. Let's implement it this way then.

krterberg commented 2 years ago

Added the suggested implementation, a note under details and a test.

markfairbanks commented 2 years ago

Awesome, thanks 😄

I'll take a look.

krterberg commented 2 years ago

Thank you for your time!