joshuaulrich / xts

Extensible time series class that provides uniform handling of many R time series classes by extending zoo.
http://joshuaulrich.github.io/xts/
GNU General Public License v2.0
219 stars 71 forks source link

make.index.unique() can create unsorted index #241

Closed joshuaulrich closed 6 years ago

joshuaulrich commented 6 years ago

make.index.unique() should add a small eps to duplicate index values. This often works, but can fail when there are consecutive observations with the same timestamp, and first observation after the block of duplicate timestamps is less than the cumulative eps.

options(digits.secs = 6)
(x <- .xts(1:5, c(rep(0, 4), 2) / 1e6))
#                            [,1]
# 1969-12-31 18:00:00.000000    1
# 1969-12-31 18:00:00.000000    2
# 1969-12-31 18:00:00.000000    3
# 1969-12-31 18:00:00.000000    4
# 1969-12-31 18:00:00.000002    5
(y <- make.index.unique(x))
#                            [,1]
# 1969-12-31 18:00:00.000000    1
# 1969-12-31 18:00:00.000001    2
# 1969-12-31 18:00:00.000002    3
# 1969-12-31 18:00:00.000003    4
# 1969-12-31 18:00:00.000002    5

It would be nice if this could be fixed, but floating point rounding error is likely to be a problem in the cases this may occur. A warning should be raised, at minimum.

Session Info

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.10

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xts_0.10-2.1 zoo_1.8-1   

loaded via a namespace (and not attached):
[1] compiler_3.4.3  tools_3.4.3     grid_3.4.3      lattice_0.20-35
ckatsulis commented 6 years ago

Main consideration here is if xts would be the class of choice for ultra low latency analysis in finance or other disciplines. If that is the case, as you have said earlier, support for more precise time indexing would be the solution. As I don't know the details of that solution, will leave it to you to weight implementation costs and potential issues with backwards compatibility.

joshuaulrich commented 6 years ago

I would like nanosecond resolution in xts, but that will take a bit of work. This problem could theoretically exist even with higher resolution index timestamps, but it would be less probable.

The current solution in this branch works around the issue by always checking and ensuring that the value of newindex[i] is always greater than both index[i-1] and newindex[i-1]. The downside to this solution is that non-duplicate index values may change. I plan to add a warning whenever that happens before merging this branch and closing the issue. I can't think of a better general solution, but I'm open to suggestions.

braverock commented 6 years ago

This sounds correct to me.

It has always been possible that make.index.unique could push things past the next observed index, and that problem only got worse as reported data moved beyond millisecond precision.

Until xts supports nano-scale indexes, checking and warning the user that they may be doing something unintended with make.index.unique seems the best solution.

ghost commented 6 years ago

@joshuaulrich:

I would like nanosecond resolution in xts, but that will take a bit of work.

I showed a possible solution to this problem (https://github.com/joshuaulrich/xts/issues/190#issuecomment-306261054) and yes, it is a lot of work, but in my opinion it is worth taking up this effort. The key is to not break xts R API for external packages.

If you decide on the proposed solution, then of course I will be very committed to help as much as possible (especially in the case of C code).