Add `apply()` functions for xts/zoo objects

joshuaulrich commented 1 year ago

These would be generic functions to make it easier to do row/column calculations on xts than use apply() and have to convert the result back to xts. This idea was prompted by #281 and many users via email, stackoverflow, etc. over the years.

A proof-of-concept implementation of both is below. I can't decide between the names rowapply(), or applyrows(), or either with a separator between the words.

rowapply.xts <- function(x, func. = NULL, ...)
{
    a_out <- apply(X = x, MARGIN = 1, FUN = func., ...)

    result_is_vector <- is.null(dim(a_out))
    if (result_is_vector) {
        a_out <- matrix(a_out, ncol = 1)
        # set column name to func. if is.name(func.) is TRUE?
    } else {
        # any additional processing when a_out has multiple columns
        a_out <- t(a_out)
    }
    x_out <- .xts(a_out, .index(x))
    xtsAttributes(x_out) <- xtsAttributes(x)

    return(x_out)
}

colapply.xts <- function(x, func. = NULL, ...)
{
    a_out <- apply(X = x, MARGIN = 2, FUN = func., ...)

    result_is_vector <- is.null(dim(a_out))
    if (result_is_vector) {
        a_out <- matrix(a_out, nrow = 1)
        x_out <- xts(a_out, end(x))     # only one row; set index to last value
        colnames(x_out) <- colnames(x)  # re-use input column names
    } else {
        stop("I have no idea what to do in this case")
    }
    xtsAttributes(x_out) <- xtsAttributes(x)

    return(x_out)
}

And some example use cases:

library(xts)
data(sample_matrix)
x <- head(as.xts(sample_matrix), 10)

rowapply.xts(x, sum)       # function returns scalar
##                [,1]
## 2007-01-02 200.2258
## 2007-01-03 201.2805
## 2007-01-04 201.4384
## 2007-01-05 201.3026
## 2007-01-06 200.7810
## 2007-01-07 200.3314
## 2007-01-08 200.0970
## 2007-01-09 199.7076
## 2007-01-10 199.9276
## 2007-01-11 200.2488

rowapply.xts(x, quantile)  # function returns vector
##                  0%      25%      50%      75%     100%
## 2007-01-02 49.95041 50.01744 50.07878 50.11778 50.11778
## 2007-01-03 50.23050 50.23050 50.31408 50.40372 50.42188
## 2007-01-04 50.26414 50.31530 50.37666 50.42096 50.42096
## 2007-01-05 50.22103 50.30620 50.35403 50.37347 50.37347
## 2007-01-06 50.11121 50.16364 50.21272 50.24433 50.24433
## 2007-01-07 49.99185 49.99185 50.06198 50.15299 50.21561
## 2007-01-08 49.96971 49.98347 50.01181 50.05257 50.10363
## 2007-01-09 49.80454 49.88613 49.95411 49.99489 49.99489
## 2007-01-10 49.91228 49.91228 49.94237 50.01198 50.13053
## 2007-01-11 49.88529 49.88529 50.06220 50.23910 50.23910

colapply.xts(x, sum)       # function returns scalar
##                Open     High     Low    Close
## 2007-01-11 501.2691 502.2622 500.341 501.4683

#colapply.xts(x, quantile)  # no idea what to do here
# this is the apply() output
apply(x, 2, quantile)
##           Open     High      Low    Close
##  0%   49.88529 49.99489 49.80454 49.91333
##  25%  50.00505 50.12096 49.92182 49.98901
##  50%  50.08595 50.22736 49.98078 50.14945
##  75%  50.24087 50.34118 50.19358 50.30904
##  100% 50.42096 50.42188 50.26414 50.39767

zeileis commented 1 year ago

I agree that this is a useful functionality, thanks for suggesting an implementation. I also think that it would be good if xts and zoo added this in a compatible way. Some comments:

apply() generic:
I just wanted to mention this possibility. Rather than introducing new generics we could also provide our own zoo::apply() whose default method just calls base::apply(). This would probably be most convenient for many users but shadowing functions from base is always potentially dangerous.
Names of new generics and arguments:
If we go for new generics, then I like rowapply() and colapply() which is similar to rowMeans(), rowSums() etc. except for the camel case. I would be rather strongly in favor of calling the function argument FUN and not func.. This is what is most frequently used in apply-type functionality, I think.
colapply() with vector return value:
There are two possibilities I can think of: (a) Returning a plain matrix with a warning that the result is not a time series object anymore. (b) Flatten the matrix with . type names. I would probably use (b) because (a) can be obtained by apply() rather than colapply().
Default time index:
In colapply() we could add an argument that controls whether the result is anchored at the end or the beginning or somewhere in between. I think the end is the most natural default but I would expect that other choices will also be requested.

zeileis commented 1 year ago

I have also written to Kurt now to find out whether he thinks this is important enough to try to propose generics and default methods on Bugzilla.

joshuaulrich commented 1 year ago

This would probably be most convenient for many users but shadowing functions from base is always potentially dangerous.

I agree about avoiding masking functions from base. That's part of why I opened this issue (becausse xts::rowMeans() and xts::rowSums() mask the base functions. Though I do sympathize with the new generic being convenient for users.

I would be rather strongly in favor of calling the function argument FUN and not func.

Agreed.

Regarding colapply(), it's not clear to me how it should work, because of the issues you raised. For example, what should the output look like if you call colapply(x, quantile)? The output is a vector for each column... maybe we return a zoo/xts object with duplicate timestamps? Though IIRC, that can cause issues w/zoo.

zeileis commented 1 year ago

Sorry, I just now realized that my Markdown formatting was broken and hence my suggestion (b) for multi-row colapply() wasn't displayed properly. My suggestion was to flatten the returned matrix in the following way:

colapply.xts(x, quantile)
##               Open.0%   High.0%    Low.0%  Close.0%  Open.25%  High.25%   Low.25% Close.25%  ...
##  2007-01-11  49.88529  49.99489  49.80454  49.91333  50.00505  50.12096  49.92182  49.98901  ...

If users want the plain matrix output, they can get it via apply().

And duplicated time stamps lead to conceptual problems when merging different series with different numbers of duplicated time stamps. That's why zoo warns about them.

joshuaulrich / xts

Add `apply()` functions for xts/zoo objects #380