joshuaulrich / xts

Extensible time series class that provides uniform handling of many R time series classes by extending zoo.
http://joshuaulrich.github.io/xts/
GNU General Public License v2.0
220 stars 71 forks source link

`period.apply` doesn't calculate column-wise aggregates for functions other than mean #366

Closed stulacy closed 1 year ago

stulacy commented 2 years ago

Description

When running period.apply to calculate time-based averages on xts objects with more than 1 column, a single aggregate over all columns is returned for each time-period rather than a column-wise aggregation, unless the mean is used. The reason for this is there is a mean.xts method that determines whether the input is multi-dimesional and returns column-wise means if so. However, other aggregation functions (median, max, min etc...) do not have methods for xts and thus run the standard base function.

A solution would be to check if the data is multi-dimensional within period.apply and calculate multi-dimensional aggregates if so, rather than delegating this behaviour to specific methods.

I can submit a PR if this would be helpful.

Expected behavior

period.apply returns column-wise aggregates for multi-dimensional time-series regardless of which aggregation function is used.

Minimal, reproducible example

library(xts)
library(lubridate)

df <- data.frame(x=seq.POSIXt(from=as_datetime("2020-03-17 00:00:00"), to=as_datetime("2020-03-17 00:14:00"), by="1 min"), y = 1:15, z=21:35)
xts_obj <- xts(df[, c("y", "z")], df$x)

# Works
period.apply(xts_obj, endpoints(xts_obj, on="minutes", k=5), mean)
# Doesn't work
period.apply(xts_obj, endpoints(xts_obj, on="minutes", k=5), median)

Session Info

R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Manjaro Linux

Matrix products: default
BLAS:   /usr/lib/libopenblasp-r0.3.20.so
LAPACK: /usr/lib/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lubridate_1.8.0 xts_0.12.1      zoo_1.8-9      

loaded via a namespace (and not attached):
[1] compiler_4.1.3  generics_0.1.2  grid_4.1.3      lattice_0.20-45
ethanbsmith commented 2 years ago

it really depends on the function being called. the contract for how input is handled and what the output means is really defined by the function.. eg: ATR takes multiple columns in and returns multiple columns. I'd be careful that modifying period.apply to work the way suggested could break almost as many things as its trying to fix.

joshuaulrich commented 2 years ago

I agree with @ethanbsmith. period.apply would have to be a lot more complex to figure out what the supplied function returned in order to work column-wise for any combination of function and xts data input.

mean works column-wise because the default method (stats::mean) used to. Then it was changed to work only for vectors in order to be consistent with other functions (e.g. median). That broke some packages that depend on xts on relied on the original behavior of stats::mean. I wish I would not have made the change in xts because of issues like this.

stulacy commented 2 years ago

That's fair enough, I hadn't realised that's how stats::mean used to work. Would it be useful to add a sentence to the docs explaining that custom functions won't apply column-wise by default?

joshuaulrich commented 1 year ago

I'm closing this because it won't (currently) result in any code changes. It's also related to #124.