Open jens-daniel-mueller opened 4 years ago
With only two observations I think there is some issues with computing the variance for the model.
When as.Date()
is used to create the date
vector, the fill_gaps()
function expands the number of rows from 3 to 5 (daily grid). In this case the interpolation works with only two observations.
When as.POSIXct()
is used to create the date
vector, thefill_gaps()
function expands the number of rows from 3 to 97 (hourly grid). In this case the interpolation fails as outlined in the initial comment.
This leads me to the guess, that it is not the variance of the model that causes the problem. However, this is just a guess.
In addition, I'm skeptical about the fill_gaps()
approach, because this will propably cause very large NA gaps when interpolating time series that cover several years with one observation every few days, but still with resolution of seconds on the date vector. Is a direct interpolation to the desired time stamp possible?
I still suspect it is the variance for this particular case, but I'll need to look into it more. The model returned from stats::arima()
has NaN
variance, likely due to the small number of observed values.
As for your second question, you can definitely do direct interpolation of specific time stamps. However it depends on the model that you are using. The ARIMA()
model requires equal spacing between observations, and so to interpolate something between two times you'll need to construct equally spaced intermediate values as is done with fill_gaps()
. As an example (and the only model I think supports it so far), TSLM()
supports arbitrary spacing between observations. So if you use TSLM()
you can specify arbitrary time stamps to interpolate.
@mitchelloharawild, what would the call to TSLM
look like if you wanted to do a linear interpolation between those points? The ARIMA approach outlined above works well in certain instances, but not in the generic case described below, where one entity (key == 'A'
) has missing values and the other (key == 'B')
consists entirely of three consecutive months of complete data:
library(tidyverse)
library(tsibble)
library(fable)
df <- data.frame(
key = c(rep('A', 3), rep('B', 3)),
date = yearmonth(as.Date(c('2019-01-01', '2019-02-01', '2019-04-01', '2019-01-01', '2019-02-01', '2019-03-01'))),
value = c(5, 7, 1, 25, 26, 28)
) %>%
as_tsibble(index = date, key = key) %>%
fill_gaps()
df %>%
model(naive = ARIMA(value ~ -1 + pdq(0,1,0) + PDQ(0,0,0))) %>%
interpolate(df)
Error: Problem with `mutate()` input `interpolated`.
✖ no applicable method for 'interpolate' applied to an object of class "null_mdl"
ℹ Input `interpolated` is `map2(naive, new_data, interpolate, ...)`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.
2: 1 error encountered for naive
[1] Could not find an appropriate ARIMA model.
This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
The TSLM
approach with a trend()
special doesn't give an exact linear interpolation:
df %>%
model(naive = TSLM(value ~ trend())) %>%
interpolate(df)
# A tsibble: 7 x 3 [1M]
# Key: key [2]
key date value
<fct> <mth> <dbl>
1 A 2019 Jan 5
2 A 2019 Feb 7
3 A 2019 Mar 3.29 <- this should be 4
4 A 2019 Apr 1
5 B 2019 Jan 25
6 B 2019 Feb 26
7 B 2019 Mar 28
I'm not confident trend()
is the right special but having trouble grasping what it should be.
This issue refers to a communicatio with Rob Hyndman started on stackoverflow.
https://stackoverflow.com/questions/61078446/interpolation-of-irregular-time-series-with-r
I'm looking for a way to interpolate irregular time series data where the timestamp is POSIXct (rather than a date).
Rob proposed following solution, which does not seem to work with the example df I created.
Thanks for taking a look again!