joshuaulrich / xts

Extensible time series class that provides uniform handling of many R time series classes by extending zoo.
http://joshuaulrich.github.io/xts/
GNU General Public License v2.0
220 stars 71 forks source link

Correct time of day subsetting when using hours. #327

Closed claymoremarshall closed 3 years ago

claymoremarshall commented 4 years ago

e.g. subsetting with time of day using hours only, e.g. "T01/T03", does not currently give expected results, returnong the incorrect rows for the time window.

This requires modifying .subsetTimeOfDay(), which did not cover all behaviour previously covered by subsetting which used .parseISO8601().

Careful attention is needed to ensure the second time string (T03 in the above example) covers all bars up to the end of the hour (03:59:59.9999) as was the behaviour previously with .parseISO8601().

An (optional) timeString validation is done to ensure passed in time strings are valid.

Unit tests also updated for time of day subsetting.

Fixes #326

joshuaulrich commented 4 years ago

I only had time to take a quick look, but this looks very thorough! I will do a more comprehensive review this weekend.

I did notice you added this test: checkException(x["T0100/T0115"]). That will break existing code.

I won't ask you to add comprehensive test coverage for existing functionality before accepting this PR (though I appreciate any tests you do add!). But it's best practice to write and run your unit tests before making any changes, to ensure they produce expected behavior (including errors you intend to fix). So I would appreciate if you ran your tests against master before adding your changes.

claymoremarshall commented 4 years ago

Regarding this, do you want to support x["T0100/T0115"]? I made an assumption that users should be forced to do x["T01:00/T01:15"] instead, but can modify the patch to permit without colons if desired. Just makes string checking a little more tricky

braverock commented 4 years ago

Regarding this, do you want to support x["T0100/T0115"]? I made an assumption that users should be forced to do x["T01:00/T01:15"] instead, but can modify the patch to permit without colons if desired. Just makes string checking a little more tricky

I would suggest requiring the colon. The ISO standard, as I recall, expects the colon, and forcing a model of ##:##:##.######## seems to leave the smallest number of ways in qhich this could be misinterpreted.

joshuaulrich commented 4 years ago

I would suggest requiring the colon. The ISO standard, as I recall, expects the colon

I just looked at the standard. The basic format does not use a colon, while the extended format does. Both are acceptable.

Edit: also, the standard requires each element be zero-padded, so these are the only supported formats:

Extended Basic
hh:mm:ss.sss or hhmmss.sss
hh:mm:ss or hhmmss
hh:mm or hhmm
hh  hh 

That said, we can still support "T1/T2" because that's unambiguous, but "T12/T23" is not. Even "T1020" is not clear whether it represents 1:02, or 10:02... unless we want to say that the leading zero is not required for hours, but is required for minutes and seconds.

joshuaulrich commented 3 years ago

@claymoremarshall thanks a lot for taking the initiative to fix this bug! Your code and tests were a really good start, and made it a lot easier for me to make updates.

I restored the 'basic format' functionality and made the regex a bit more robust. I also removed the ability to omit zero-padding on everything except hours. That wasn't supported previously, so we're not losing anything, and it makes the xts time-of-day strings conform to the standard (except that the leading zero can be omitted from hours, which isn't in the standard).