r-lib / vctrs

Generic programming with typed R vectors
https://vctrs.r-lib.org
Other
287 stars 66 forks source link

Return `NA` from `max/min.vctrs_vctr` with empty vectors #1004

Open DavisVaughan opened 4 years ago

DavisVaughan commented 4 years ago

Currently, max.vctrs_vctr tries to cast -Inf to the type of x if x is empty. I'm not sure that that makes sense in all cases, and it assumes that there is a cast method from double to the type of x (and that that cast method works with infinity).

For example, we can generally take the max/min of a character vctr, but if it is empty we get an error. It doesn't error for pmax() because its size invariant is different. max() is required to return an object of size 1, pmax() returns an object of the common size of all its input.

library(vctrs)

y <- new_vctr("x")
max(y)
#> <vctrs_vctr[1]>
#> [1] x
pmax(y)
#> <vctrs_vctr[1]>
#> [1] x

x <- new_vctr(character())
max(x)
#> Error: Can't convert <double> to <vctrs_vctr>.
pmax(x)
#> <vctrs_vctr[0]>

I think I would argue that for vctrs_vctr objects, we should return a typed NA for max() and min() if the input is empty. This would keep it from ever erroring, wouldn't require cast methods to work, and wouldn't be a completely new idea because it's what max() does with empty character input.

max(integer())
#> Warning in max(integer()): no non-missing arguments to max; returning -Inf
#> [1] -Inf
max(numeric())
#> Warning in max(numeric()): no non-missing arguments to max; returning -Inf
#> [1] -Inf
max(character())
#> Warning in max(character()): no non-missing arguments, returning NA
#> [1] NA

For numeric/integer vctrs_vctr objects, returning a typed NA would be different from R returning -Inf for numerics, but I think that is ok. If a developer is creating a vctrs_vctr subclass on top of a numeric, I don't think we can assume it should behave the same way as a base R numeric here, and returning NA in all cases is the most consistent thing we can do

lionel- commented 3 years ago

The problem of returning NA is that we are losing the information that an empty vector doesn't contain any value that might be lower or larger than a future input.

DavisVaughan commented 3 years ago

It really seems like the minimum and maximum allowed values of a specific ptype are specific to that ptype, and can often be impossible to guess using general heuristics. Maybe we need something like vec_maximum(ptype) and vec_minimum(ptype) which could be generic and return the min/max values of the given ptype?

For example, in clock the minimum allowed calendar year is around year -32768 but there is no R level heuristic that can tell you that (it's the minimum value of a C++ short)

lionel- commented 3 years ago

The base methods emit a warning. Maybe we should just fail?