joshuaulrich / xts

Extensible time series class that provides uniform handling of many R time series classes by extending zoo.
http://joshuaulrich.github.io/xts/
GNU General Public License v2.0
219 stars 70 forks source link

Create an as-of join function #396

Open joshuaulrich opened 1 year ago

joshuaulrich commented 1 year ago

I frequently need to align a lower periodicity series (LP) with a higher periodicity series (HP), where the result only has index values from the index of HP. I want the last value of LP at each HP index value.

This doesn't work with merge.xts(HP, LP, join = "left", fill = na.locf) because na.locf() isn't applied until after the merge is complete, and it's very likely every index value in LP will also be in the HP index. So the result of the merge doesn't have the correct values for LP. So you have to a regular merge and then subset the merge result by index(HP) to get the desired values.

Here's an example of the desired functionality:

set.seed(21)
hours <- timeBasedSeq("2023-01-01/2023-01-02/H")
hours <- xts(seq_along(hours), hours + 60 * runif(length(hours)))

mins <- timeBasedSeq("2023-01-01/2023-01-02/M")
mins <- xts(seq_along(mins), mins)

# the value for 'mins' is always NA because 'hours' do not have any observations on the minute
x <- merge(hours, mins, join = "left", fill = na.locf)
head(x)
##                     hours mins
## 2023-01-01 00:00:47     1   NA
## 2023-01-01 01:00:15     2   NA
## 2023-01-01 02:00:41     3   NA
## 2023-01-01 03:00:11     4   NA
## 2023-01-01 04:00:57     5   NA
## 2023-01-01 05:00:55     6   NA

# desired result, but inefficient for large data
y <- merge(hours, mins, fill = na.locf)
(y <- y[index(mins)])
##                     hours mins
## 2023-01-01 00:00:00    NA    1
## 2023-01-01 00:01:00     1    2
## 2023-01-01 00:02:00     1    3
## 2023-01-01 00:03:00     1    4
## 2023-01-01 00:04:00     1    5
## 2023-01-01 00:05:00     1    6
## 2023-01-01 00:06:00     1    7
## 2023-01-01 00:07:00     1    8
## 2023-01-01 00:08:00     1    9
## 2023-01-01 00:09:00     1   10
##                 ...
## 2023-01-02 23:50:00    48 2871
## 2023-01-02 23:51:00    48 2872
## 2023-01-02 23:52:00    48 2873
## 2023-01-02 23:53:00    48 2874
## 2023-01-02 23:54:00    48 2875
## 2023-01-02 23:55:00    48 2876
## 2023-01-02 23:56:00    48 2877
## 2023-01-02 23:57:00    48 2878
## 2023-01-02 23:58:00    48 2879
## 2023-01-02 23:59:00    48 2880

Here's a crude implementation that gives the desired result. The actual implementation should be done in C for efficiency.


as_of_join <-
function(x, y, ..., join = c("full", "left", "right"), return_side = c(TRUE, TRUE))
{
    join <- match.arg(join)

    out <- merge(x, y, fill = na.locf, retside = return_side)

    out <- switch(join,
                  full = out,
                  left = out[index(x)],
                  right = out[index(y)])

    return(out)
}

as_of_join(hours, mins, join = "right")
##                        x    y
## 2023-01-01 00:00:00   NA    1
## 2023-01-01 00:01:00    1    2
## 2023-01-01 00:02:00    1    3
## 2023-01-01 00:03:00    1    4
## 2023-01-01 00:04:00    1    5
## 2023-01-01 00:05:00    1    6
## 2023-01-01 00:06:00    1    7
## 2023-01-01 00:07:00    1    8
## 2023-01-01 00:08:00    1    9
## 2023-01-01 00:09:00    1   10
##                 ...
## 2023-01-02 23:50:00   48 2871
## 2023-01-02 23:51:00   48 2872
## 2023-01-02 23:52:00   48 2873
## 2023-01-02 23:53:00   48 2874
## 2023-01-02 23:54:00   48 2875
## 2023-01-02 23:55:00   48 2876
## 2023-01-02 23:56:00   48 2877
## 2023-01-02 23:57:00   48 2878
## 2023-01-02 23:58:00   48 2879
## 2023-01-02 23:59:00   48 2880