Create an as-of join function

I frequently need to align a lower periodicity series (LP) with a higher periodicity series (HP), where the result only has index values from the index of HP. I want the last value of LP at each HP index value.

This doesn't work with merge.xts(HP, LP, join = "left", fill = na.locf) because na.locf() isn't applied until after the merge is complete, and it's very likely every index value in LP will also be in the HP index. So the result of the merge doesn't have the correct values for LP. So you have to a regular merge and then subset the merge result by index(HP) to get the desired values.

Here's an example of the desired functionality:

set.seed(21)
hours <- timeBasedSeq("2023-01-01/2023-01-02/H")
hours <- xts(seq_along(hours), hours + 60 * runif(length(hours)))

mins <- timeBasedSeq("2023-01-01/2023-01-02/M")
mins <- xts(seq_along(mins), mins)

# the value for 'mins' is always NA because 'hours' do not have any observations on the minute
x <- merge(hours, mins, join = "left", fill = na.locf)
head(x)
##                     hours mins
## 2023-01-01 00:00:47     1   NA
## 2023-01-01 01:00:15     2   NA
## 2023-01-01 02:00:41     3   NA
## 2023-01-01 03:00:11     4   NA
## 2023-01-01 04:00:57     5   NA
## 2023-01-01 05:00:55     6   NA

# desired result, but inefficient for large data
y <- merge(hours, mins, fill = na.locf)
(y <- y[index(mins)])
##                     hours mins
## 2023-01-01 00:00:00    NA    1
## 2023-01-01 00:01:00     1    2
## 2023-01-01 00:02:00     1    3
## 2023-01-01 00:03:00     1    4
## 2023-01-01 00:04:00     1    5
## 2023-01-01 00:05:00     1    6
## 2023-01-01 00:06:00     1    7
## 2023-01-01 00:07:00     1    8
## 2023-01-01 00:08:00     1    9
## 2023-01-01 00:09:00     1   10
##                 ...
## 2023-01-02 23:50:00    48 2871
## 2023-01-02 23:51:00    48 2872
## 2023-01-02 23:52:00    48 2873
## 2023-01-02 23:53:00    48 2874
## 2023-01-02 23:54:00    48 2875
## 2023-01-02 23:55:00    48 2876
## 2023-01-02 23:56:00    48 2877
## 2023-01-02 23:57:00    48 2878
## 2023-01-02 23:58:00    48 2879
## 2023-01-02 23:59:00    48 2880

Here's a crude implementation that gives the desired result. The actual implementation should be done in C for efficiency.


as_of_join <-
function(x, y, ..., join = c("full", "left", "right"), return_side = c(TRUE, TRUE))
{
    join <- match.arg(join)

    out <- merge(x, y, fill = na.locf, retside = return_side)

    out <- switch(join,
                  full = out,
                  left = out[index(x)],
                  right = out[index(y)])

    return(out)
}

as_of_join(hours, mins, join = "right")
##                        x    y
## 2023-01-01 00:00:00   NA    1
## 2023-01-01 00:01:00    1    2
## 2023-01-01 00:02:00    1    3
## 2023-01-01 00:03:00    1    4
## 2023-01-01 00:04:00    1    5
## 2023-01-01 00:05:00    1    6
## 2023-01-01 00:06:00    1    7
## 2023-01-01 00:07:00    1    8
## 2023-01-01 00:08:00    1    9
## 2023-01-01 00:09:00    1   10
##                 ...
## 2023-01-02 23:50:00   48 2871
## 2023-01-02 23:51:00   48 2872
## 2023-01-02 23:52:00   48 2873
## 2023-01-02 23:53:00   48 2874
## 2023-01-02 23:54:00   48 2875
## 2023-01-02 23:55:00   48 2876
## 2023-01-02 23:56:00   48 2877
## 2023-01-02 23:57:00   48 2878
## 2023-01-02 23:58:00   48 2879
## 2023-01-02 23:59:00   48 2880

joshuaulrich / xts

Create an as-of join function #396