Closed DavisVaughan closed 2 years ago
If coalesce()
errors with 0 inputs, wouldn't you expect coalesce(NULL)
to error? I think our general philosophy is that foo(x, NULL,NULL)
should be equivalent to foo(x, ,)
, which is generally equivalent to foo()
. I'd prefer to stick to that principle here, if we can.
If coalesce() errors with 0 inputs, wouldn't you expect coalesce(NULL) to error?
They both error, just with different errors:
coalesce()
#> Error in `coalesce()`:
#> ! `...` must contain at least one input.
coalesce(NULL)
#> Error in `coalesce()`:
#> ! `..1` must be a vector, not NULL.
I think our general philosophy is that foo(x, NULL,NULL) should be equivalent to foo(x, ,)
Yea this is what I was alluding to at the end of the second bullet point: "An alternative to this is to drop all NULL
inputs at the beginning". I can do that, that would make coalesce(NULL)
fail "automatically" because there would be 0 inputs after removing the nulls.
Are you ok with coalesce(NULL, 1:5)
returning 1:5
? I guess that is the main question here. (Seems consistent with the idea that NULL
inputs are basically equivalent to not providing anything, as you suggested)
Okay we now stick to this principle:
philosophy is that
foo(x, NULL,NULL)
should be equivalent tofoo(x, ,)
, which is generally equivalent tofoo(x)
Giving us:
coalesce()
#> Error in `coalesce()`:
#> ! `...` must contain at least one input.
coalesce(NULL)
#> Error in `coalesce()`:
#> ! `...` must contain at least one input.
coalesce(c(1, NA), NULL, 2)
#> [1] 1 2
I am fairly certain this is what you wanted, so I'll merge this. Feel free to follow up if not.
Yeah, that's what I was thinking, thanks!
But reading through #64 again, I do find your arguments there compelling, even if it would make coalesce()
less consistent with SQL. It's hard to imagine a case where you'd want to recycle the first argument to match the length of the second, and then you might as well apply that principle to the types as well. (And I think the only place where that makes a significant difference is whether coalesce(c(NA, 1L), 1))
returns an integer or double)
It does mean you can't splice in all of the inputs at once anymore
coalesce2 <- function(x, ...) {
funs::coalesce(x, ...)
}
vecs <- list(
c(1, 2, NA, NA, 5),
c(NA, NA, 3, 4, 5)
)
coalesce2(!!!vecs)
#> Error in !vecs: invalid argument type
funs::coalesce(!!!vecs)
#> [1] 1 2 3 4 5
I think the main question is deciding which of these two is the main use case:
coalesce(x, 0)
(i.e. you have a primary first input)coalesce(x, y, z)
(i.e. all inputs are the same to you)The first favors a signature of coalesce(x, ...)
and the second favors coalesce(...)
.
The other way it is described by most of the SQL docs I found is as syntactic sugar for case_when()
, which is why it takes the common type in SQL:
COALESCE(expression1, expression2, ..., expressionN)
CASE
WHEN (expression1 IS NOT NULL) THEN expression1
WHEN (expression2 IS NOT NULL) THEN expression2
...
ELSE expressionN
END
Assuming you have GitHub code search: https://cs.github.com/?scopeName=All+repos&scope=&q=language%3Ar+coalesce%28%21%21%21
Approximately 20-ish distinct uses of coalesce(!!!<stuff>)
that would break with a coalesce(x, ...)
signature
Oh I forgot about the !!!
case 😞. We do have a hack from aes()
that we could apply here, but I don't think it's worth it.
FWIW I do think in coalesce(x, y, z)
x
is still primary, since that's the vector that you're filling in.
I guess we could still make the first argument of ...
primary, even if we don't give it its own argument. But that starts to maybe feel maybe inconsistent? What other functions do we have that are like coalesce()
? pmin()
and pmax()
are similar, which connects us to the set of functions suggested in #5. I guess it's also related to case_when()
and if_else()
. if_else()
uses the size of the first argument, but the common type of all four.
Base R's pmin()
and pmax()
pull attributes from the first input, but I don't think we should let that behavior sway us.
I'd argue that ideally pmin()
, pmax()
, case_when()
, and if_else()
should all take the common type by default. I do think all 4 should have optional ptype
and size
arguments to override the default common type/size computation. That was an improvement I was planning on doing when I rewrite case_when()
and if_else()
with vctrs.
The main difference between those 4 and coalesce()
is that coalesce()
seems to have a primary input, and those don't.
However, if we stick with computing the common type for the 4 functions mention above, then this comment would indeed feel inconsistent, as you suggested:
make the first argument of
...
primary
Ok, that reasoning sounds solid to me.
One more thought: we should probably add a check_dots_unnamed()
here.
Re: check_dots_unnamed()
There are also a number of cases that code search brought up where people are splicing in data frames. Like they used select()
to tidyselect some columns and then spliced those in. So this would break that :/
Here is one example: https://cs.github.com/CorrelAid/projectutils/blob/a2984cc11ca1c88aeb3436c8b661f16356b1903d/R/surveymonkey.R#L127
Oh bummer 😞
Closes #48 Closes #64
A few changes to
coalesce()
as I prep to port this over to dplyr.For #48, over slack we decided that we'd rather keep
coalesce()
simple and always use vctrs invariants, so when given data frames as input it should only ever update entirely missing rows (that is whatvec_equal_na()
thinks is "missing"). If you want to coalesce by column then you should really justmap2()
orpmap()
over the data frames, callingcoalesce()
as the function to use on each column. I have added a few tests to reflect this expectation.For #64, I had suggested we switch to
coalesce(x, ...)
and always cast to the type ofx
and recycle to the size ofx
. I did some research, and the SQL standards state thatcoalesce()
should cast to the common type of all of its inputs. See this link for one example where that is documented https://docs.microsoft.com/en-us/sql/t-sql/language-elements/coalesce-transact-sql?view=sql-server-ver15#return-types. So I think we should keep that behavior by default. It sort of makes sense if you have >=2 full vectors and you don't really have a "primary" input, likecoalesce(x, y, z)
. My original use case was something likecoalesce(x, 0)
, where you'd want to retain the type ofx
.To optionally enforce type and size stability in cases like
coalesce(x, 0)
, I've added.ptype
and.size
arguments. I think this is good enough for what I wanted.I've added
list_check_all_vectors()
to check that all inputs are vectors. This disallowsNULL
values in...
. I think this is a good idea becausecoalesce(NULL, .size = 5)
would need to return something of size 5 and there is no way to do that without typed inputs. An alternative to this is to drop allNULL
inputs at the beginning, but I have a feeling aNULL
input would be a user error in this function, so I prefer an error. We can change to this approach in the future if this proves to be too strict (better to start strict, I think?).I've required at least 1 input, rather than returning
NULL
. This is the current dplyr behavior and makes sense becausecoalesce(.size = 5)
would have to return something of size 5, and there is no way to do that if no inputs are given.