tidyverts / tsibble

Tidy Temporal Data Frames and Tools
https://tsibble.tidyverts.org
GNU General Public License v3.0
529 stars 49 forks source link

Calendarise self-defined date-times (e.g. business days and time) and respect structural missingness #18

Open earowang opened 6 years ago

earowang commented 6 years ago

tsibble is designed to work with many types of index objects, as long as these S3 methods including index_valid() and pull_interval() are defined for custom index classes, for example the timeDate class, and then fill_na() naturally work out of box.

Since trading/business hours differ from one market/store to another, data are not recorded out of the trading hours. But fill_na() will insert NA to the non-trading time, because tsibble thinks it as calendar periods. tsibble needs a more general class that handles custom time ranges and respects these missing observations.

For example, the data set calls from the fpp2 package contains five-minute call volume handled on weekdays between 7:00am and 9:05pm, from 3 March 2003 to 23 May 2003. A possible interface may look like this?

Define your own calendar function:

my_cal <- calendarise(
  # a typical business day starts from 7am and ends at 9:05pm w/o breaks
  from = "07:00:00", 
  to = "21:05:00",
  break = NULL,
  # set Sat and Sun as no working day
  wday = exclude(6:7),
  # set a particular date as no working day
  date = exclude("2003-04-21") # Easter break 
)

Then apply to a vector of date-times and tsibble respects its missing time gaps:

as_tsibble(fpp2::call2, index = my_cal(index))

How others handle with custom business days and hours:

DavisVaughan commented 6 years ago

I've also got some work done on exporting a subset of QuantLib that handles calendar dates and holidays. It's not done yet, and also doesn't really handle intraday systems yet, but I want it to.

https://github.com/DavisVaughan/calendarrr

There is also RQuantLib, but it's massive and a pain to install. This is self-contained.

~It doesn't have native support for excluding weekends, but I don't think it would be too difficult to add.~ There is actually this "bespoke calendar" that starts with no holidays and no weekends defined, and the user can define what they are. This is quite useful as a base calendar. https://github.com/lballabio/QuantLib/blob/master/ql/time/calendars/bespokecalendar.cpp

I could see how you could attach a calendar object to a tsibble and then it knows how to adjust the calculations based on the holidays and excluded days from that calendar.

earowang commented 6 years ago

I was poking around calendarrr yesterday. It's a good starting point, although it doesn't provide time adjustment within a day. Maybe you wanna share some your thoughts here.

DavisVaughan commented 6 years ago

On one hand, I'd like to modify the quantlib source directly to make the internal adjustments to allow for setting times within a day that are not allowed (like trading hours or something similar). The core of that problem would be defining (in cpp) isAllowedTime() (this can be named whatever) for the base calendar (adding adjustments if necessary for a few other calendars that inherit from it) and then altering adjust() to adjust for 1st) holidays 2nd) intraday hours. There might also be some work required in adjusting the advance() method, but I'm not sure yet.

The downside of this is that if quantlib changes and we want to merge in that new code, its not as straightforward as a copy paste because now we have changed their source code directly.

I'm not sure if there is a good way to add new methods to the classes that are already there without modifying their code directly, but that would be ideal.

Alternatively, this is something that the Quantlib team might be interested in, so they might be open to having this in quantlib directly.

earowang commented 6 years ago

I'm also interested in knowing if it's possible to vectorise cal_advance() for taking a vector of n, and have seq() and arithmetic operators +/- working with calendarrr.

To incorporate a calendar into the tsibble framework, I suppose a new argument calendar = NULL is needed in build_tsibble(), and hence a new attribute calendar in the tbl_ts.

DavisVaughan commented 6 years ago

Does it make much sense to vectorize both dates and n in cal_advance()? How would this behave? cal_advance(Sys.Date() + 1:2, n = c(1, 2) )

For seq(), I think we could provide a limited interface to the Schedule class. https://www.quantlib.org/slides/dima-ql-intro-1.pdf Slide 32

For + and -, I've been thinking about how these could (should?) work with vectors. The only thing I've come up with is to create a new data type, call it Date_cal, that would have a calendar as an attribute on the vector. Then Date_cal + 2 would know where the holidays are and would default to adding 2 days. Could also do Date_cal + months(2) from lubridate. I don't particularly like this though.

If tbl_ts was the object that had the calendar attribute, then in theory the date vector would not need to have the attribute on it as well. Especially for mutate() calls where this would be most useful. To me, this makes the most sense, because the entire tbl_ts object is what has the calendar associated with it. The index vector is just an index vector, so I'd expect tbl_ts$index to just return a Date object, not a special Date_cal thing.

I'm a bit torn on what to do for this + / - implementation because of this.

earowang commented 6 years ago

Probably vectorize both as in +.Date, but give an error if they are not of the same lengths, instead of warning?

x <- Sys.Date() + 1:2
x
#> [1] "2018-08-28" "2018-08-29"
x + 1
#> [1] "2018-08-29" "2018-08-30"
x + 1:2
#> [1] "2018-08-29" "2018-08-31"
x + 1:3
#> Warning in unclass(e1) + unclass(e2): longer object length is not a
#> multiple of shorter object length
#> [1] "2018-08-29" "2018-08-31" "2018-08-31"
x[1] + 1:3
#> [1] "2018-08-29" "2018-08-30" "2018-08-31"

fill_na() and lag/lead/difference all need to rely the calendar. fill_na() currently looks for seq() and arithmetic operators to generate full time sequence: https://github.com/tidyverts/tsibble/blob/9dabeab8cf742f7e9588356af34a822c7821c6a7/R/fill-na.R#L232

I'd rather not to go for an if-else statement: if !is.null(calendar), and then use lots of specialist functions from other packages, which makes code complicated.

Also, it would be nice to set a default calendar as a global option, like what bizdays does here.