Open DavisVaughan opened 1 year ago
Just chiming in here to say:
.missing
argument!Perhaps also a warning when there are NA rows that are to be dropped when .missing
is not supplied.
@williamlai2 I think the current behavior is right for most people, so I doubt we'd want to add a warning by default for something that people typically expect to happen
Is it expected though? Judging by the links in your first post I don't think it is. I'm a long time user and had this issue yesterday. I didn't realise it was happening until I noticed a lot fewer rows than expected and had to look up the solution.
@williamlai2 we’re not reconsidering the default behaviour at this time.
I had a similar issue in {powerjoin} and addressed it by defining new operators for extended equality and rowwise %in%
, that makes it a bit more flexible since we don't have to commit to the behavior for all arguments. I do see the value of the .missing
arg though.
library(dplyr, w = F)
df <- tibble(
x = c(TRUE, FALSE, NA, NA, NA),
y = c(NA, TRUE, NA, NA, NA),
z = c(TRUE, TRUE, TRUE, FALSE, NA)
)
# extended equality "bone operator"
`%==%` <- function(x, y) {
is.na(x) & is.na(y) | !is.na(x) & !is.na(y) & x == y
}
filter(df, ! x %==% FALSE, ! x %==% FALSE, ! z %==% FALSE)
#> # A tibble: 3 × 3
#> x y z
#> <lgl> <lgl> <lgl>
#> 1 TRUE NA TRUE
#> 2 NA NA TRUE
#> 3 NA NA NA
# row-wise `%in%`
`%in.%` <- function(x, y) {
conds <- lapply(y, function(yi) x %==% yi)
Reduce(`|`, conds)
}
filter(df, ! FALSE %in.% list(x, y, z))
#> # A tibble: 3 × 3
#> x y z
#> <lgl> <lgl> <lgl>
#> 1 TRUE NA TRUE
#> 2 NA NA TRUE
#> 3 NA NA NA
Created on 2023-04-29 with reprex v2.0.2
Currently,
filter()
:TRUE
FALSE
andNA
subset()
A number of requests have come up in the past desiring:
TRUE
andNA
FALSE
[
Here are a few:
This is most apparently annoying when you have multiple columns to filter by
I propose a
.missing = c("drop", "keep", "error")
argument tofilter()
that would allow you to optionally keep rows withNA
.We'd have to carefully analyze the boolean algebra here to make sure we are being consistent. In particular I think we want to make sure these are the same if we do this, but I think they are:
The
"drop"
case is probably already consistent because that is what we do today, and the"keep"
case is probably like this, which seems consistentWhen we do this, we should also think about whether
vec_pall()
orvec_pany()
could be used infilter()
in any way, since they are heavily optimized for performance.