joshuaulrich / xts

Extensible time series class that provides uniform handling of many R time series classes by extending zoo.
http://joshuaulrich.github.io/xts/
GNU General Public License v2.0
220 stars 71 forks source link

Time of day subsetting by hour only giving unexpected result #326

Closed claymoremarshall closed 3 years ago

claymoremarshall commented 4 years ago

The recent time of day subsetting changes have generated some unexpected results.

library(quantstrat)  # get some intraday bar data
.from='2002-10-21'
.to='2002-10-31'
getSymbols.FI(Symbols='GBPUSD',
              dir=system.file('extdata',package='quantstrat'),
              from=.from, to=.to
              , extension = 'rda'
              , use_identifier=NA
)

Subsetting by just hour used to work, but now gives incorrect results, and also seems to still depend on indexTZ:


head(GBPUSD["T02/T03"])
# Open   High    Low  Close Volume
# 2002-10-22 1.5429 1.5430 1.5429 1.5430      0
# 2002-10-23 1.5464 1.5464 1.5464 1.5464      0
# 2002-10-24 1.5473 1.5473 1.5472 1.5472      0
# 2002-10-25 1.5544 1.5544 1.5544 1.5544      0
# 2002-10-28 1.5474 1.5474 1.5474 1.5474      0
# 2002-10-29 1.5583 1.5583 1.5583 1.5583      0
# Warning message:
#   'indexTZ' is deprecated.
# Use 'tzone' instead.
# See help("Deprecated") and help("xts-deprecated"). 

But we do at least get expected results if including the minutes (but still have the indexTZ issue)

head(GBPUSD["T02:00/T03:00"])
# Open   High    Low  Close Volume
# 2002-10-21 02:00:00 1.5480 1.5480 1.5480 1.5480      0
# 2002-10-21 02:01:00 1.5480 1.5480 1.5480 1.5480      0
# 2002-10-21 02:02:00 1.5479 1.5480 1.5479 1.5480      0
# 2002-10-21 02:03:00 1.5482 1.5482 1.5482 1.5482      0
# 2002-10-21 02:04:00 1.5481 1.5482 1.5481 1.5482      0
# 2002-10-21 02:05:00 1.5481 1.5482 1.5481 1.5482      0
# Warning message:
#   'indexTZ' is deprecated.
# Use 'tzone' instead.
# See help("Deprecated") and help("xts-deprecated"). 

Happy to work on a PR to fix this issue if it hasn't been identified yet?

joshuaulrich commented 4 years ago

Thanks for the report! You are the first to identify and report this bug. Some background: this functionality used to be handled by .parseISO8601(), but is now handled by .subsetTimeOfDay().

I just pushed a change to replace the indexTZ() call with tzone(), so that warning should go away. Thanks for catching that too.

It would be great if you could try to fix this, and even better if you add unit tests to cover this case. As you suggested in an email, it would make sense to add some intraday data to the package to use for examples and tests. Similar to how we use sample_matrix.

I imagine there are going to be several edge cases we need to cover. So please don't hesitate to discuss them here before spending time coding a solution.

claymoremarshall commented 4 years ago

A couple of edge cases related to a "standard" time of day subset, e.g.

x["T02:00/T03:00"]

might be:

1) x["T0200/T0300"] (currently fails because the as.POSIXct conversion in timestringToSeconds fails for "1970-01-01 0200").

Does the subsetting want to support this, or enforce users to do "T02:00/T03:00"?

2) Do you want to support full support for odd edge cases such as: i) x["T02/T02"] to return all data for the second hour ii) Just throwing it out there, but would support for x["T02"] to return all data with 02:xx:xx timestamps (which would be same as subsetting with .indexhour(x) == 2)? I'd be inclined not to permit this, but maybe there would be some value in it

In the case of just supplying the hours, would you now expect x["T02/T03"] to be equivalent to x[.indexhour(x) %in% c(2,3) or x[.indexhour(x) ==2 | .indexhour(x) == 3 & .indexmin(x) == 0 & .indexsec(x) == 0]. IIRC xts supported the former previously. If we use the former, we'll have to modify timeStringToSeconds for the start and end supplied timepoints.