joshuaulrich / xts

Extensible time series class that provides uniform handling of many R time series classes by extending zoo.
http://joshuaulrich.github.io/xts/
GNU General Public License v2.0
219 stars 71 forks source link

Using make.index.unique() doesn't return expected result #206

Closed ckatsulis closed 7 years ago

ckatsulis commented 7 years ago

Description

Per the documentation for make.index.unique():

Details

The returned time-series object will have new time-stamps so that isOrdered( .index(x) ) evaluates to TRUE.

Expected behavior

When I run:

> xtsWrite = xts::xts(data[,fields], make.index.unique(data$DesiredTime, eps = 1e-05))

I would expect:

> isOrdered(.index(xtsWrite))
[1] TRUE

Instead I receive

> isOrdered(.index(xtsWrite))
[1] FALSE

Minimal, reproducible example

> head(orderData[,c(fields,"DesiredTime")],20)
       OrdPx OrdQty side BidPx AskPx         DesiredTime
1  112.40200   4000   -1    NA    NA 2017-07-17 09:23:00
2    1.14772  10000   -1    NA    NA 2017-07-17 09:23:00
3    1.30678   5000    1    NA    NA 2017-07-17 09:23:00
4    0.96122   6000   -1    NA    NA 2017-07-17 09:23:00
5    1.26357   4000    1    NA    NA 2017-07-17 09:23:00
6  112.42100   2000   -1    NA    NA 2017-07-17 09:24:00
7    1.14792   2000   -1    NA    NA 2017-07-17 09:30:00
8    0.78326   3000    1    NA    NA 2017-07-17 09:30:00
9    1.26347   2000    1    NA    NA 2017-07-17 09:30:00
10   1.30754   5000   -1    NA    NA 2017-07-17 09:30:00
11 112.47600   4000   -1    NA    NA 2017-07-17 09:30:00
12   0.78319   5000    1    NA    NA 2017-07-17 09:31:00
13 112.49900   6000   -1    NA    NA 2017-07-17 09:31:00
14   1.30753   2000    1    NA    NA 2017-07-17 09:31:00
15   1.14843   5000   -1    NA    NA 2017-07-17 09:31:00
16   1.26356   2000   -1    NA    NA 2017-07-17 09:31:00
17 112.47300   2000    1    NA    NA 2017-07-17 09:32:00
18   0.96099   6000    1    NA    NA 2017-07-17 09:31:00
19   1.26350   2000   -1    NA    NA 2017-07-17 09:32:00
20   0.78333   3000   -1    NA    NA 2017-07-17 09:33:00

xtsWrite = xts::xts(orderData[,fields], orderData$DesiredTime)

> head(cbind(xtsWrite$OrdPx,orderData$OrdPx),20)
                    xts$OrdPx  originalData$OrdPx
2017-07-17 09:23:00 112.40200 112.40200
2017-07-17 09:23:00   1.14772   1.14772
2017-07-17 09:23:00   1.30678   1.30678
2017-07-17 09:23:00   0.96122   0.96122
2017-07-17 09:23:00   1.26357   1.26357
2017-07-17 09:24:00 112.42100 112.42100
2017-07-17 09:30:00   1.14792   1.14792
2017-07-17 09:30:00   0.78326   0.78326
2017-07-17 09:30:00   1.26347   1.26347
2017-07-17 09:30:00   1.30754   1.30754
2017-07-17 09:30:00 112.47600 112.47600
2017-07-17 09:31:00   0.78319   0.78319
2017-07-17 09:31:00   0.96099 112.49900
2017-07-17 09:31:00 112.49900   1.30753
2017-07-17 09:31:00   1.30753   1.14843
2017-07-17 09:31:00   1.14843   1.26356
2017-07-17 09:31:00   1.26356 112.47300
2017-07-17 09:32:00 112.47300   0.96099   ### These are out of order from initial set
2017-07-17 09:32:00   1.26350   1.26350
2017-07-17 09:33:00   0.78333   0.78333

### Stats about frequency of occurrence 
> table(xtsWrite$OrdPx ==orderData$OrdPx)

FALSE  TRUE 
  790  3993

Session Info

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tensorflow_1.3     influxdbr_0.12.0   data.table_1.10.4  rbokeh_0.5.0       plotly_4.7.1       roll_1.0.7         rpivotTable_0.2.0  RColorBrewer_1.1-2 RPostgreSQL_0.6-2  quantmod_0.4-10    TTR_0.23-1        
[12] xts_0.10-0         zoo_1.8-0          jsonlite_1.5       ggplot2_2.2.1.9000 dygraphs_1.1.1.4   DBI_0.7            DT_0.2.12          shiny_1.0.3.9002  
joshuaulrich commented 7 years ago

I think the problem is with your data. The Note section of ?make.index.unique says:

Incoming values must be pre-sorted, and no check is done to make sure that this is the case. If the index values are of storage.mode 'integer', they will be coerced to 'double' if drop=FALSE.

Look at the timestamp for rows 17 and 18 of orderData:

17 112.47300   2000    1    NA    NA 2017-07-17 09:32:00
18   0.96099   6000    1    NA    NA 2017-07-17 09:31:00

Those are not sorted, and I think that's the cause of the behavior you see. I can't do more than guess, because your example isn't fully reproducible. You need to provide the object structure (via dput()) in order for me to run your examples.

ckatsulis commented 7 years ago

good eyes... missed that part. it should be resolved on our end anyway as we have some latent time stamp in one of our logging stats that we are changing. thanks for the quick response

On Sun, Aug 13, 2017 at 12:36 PM, Joshua Ulrich notifications@github.com wrote:

I think the problem is with your data. The Note section of ?make.index.unique says:

Incoming values must be pre-sorted, and no check is done to make sure that this is the case. If the index values are of storage.mode 'integer', they will be coerced to 'double' if drop=FALSE.

Look at the timestamp for rows 17 and 18 of orderData:

17 112.47300 2000 1 NA NA 2017-07-17 09:32:00 18 0.96099 6000 1 NA NA 2017-07-17 09:31:00

Those are not sorted, and I think that's the cause of the behavior you see. I can't do more than guess, because your example isn't fully reproducible. You need to provide the object structure (via dput()) in order for me to run your examples.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/joshuaulrich/xts/issues/206#issuecomment-322055545, or mute the thread https://github.com/notifications/unsubscribe-auth/AHKwqt5AU_9ZVvnjtNyaTbj4-fsA20tfks5sXzQNgaJpZM4O1vaA .

ckatsulis commented 7 years ago

So even after presorting and making unique of original data set, when I run make.index.unique() on the resulting xts object, I receive the following:

orderData = orderData[order(orderData$DesiredTime),]
orderData$DesiredTime = make.index.unique(orderData$DesiredTime, eps = 1e-03)
xtsWrite = xts::xts(orderData[,fields], orderData$DesiredTime)
xtsWrite = xts::make.index.unique(xtsWrite)
names(xtsWrite) = fields

> isOrdered(orderData$DesiredTime)
> xts::is.index.unique(xtsWrite)
[1] TRUE
> xts::isOrdered(xtsWrite)
[1] FALSE

orderData.txt

joshuaulrich commented 7 years ago

Thank you for the reproducible example! orderData$DesiredTime is the number of seconds since the epoch, not POSIXct. So I'm not sure how your xts() call works. I've used .POSIXct() to convert it to POSIXct.

orderData <- dget("https://github.com/joshuaulrich/xts/files/1220681/orderData.txt")
fields <- "OrdId"
orderData$DesiredTime <- make.index.unique(orderData$DesiredTime, eps = 1e-03)
xtsWrite <- xts::xts(orderData[,fields], .POSIXct(orderData$DesiredTime))
xtsWrite <- xts::make.index.unique(xtsWrite)
names(xtsWrite) <- fields
isOrdered(orderData$DesiredTime)
# [1] FALSE
xts::is.index.unique(xtsWrite)
# [1] TRUE
xts::isOrdered(xtsWrite)  # Expect FALSE, since first 3 OrdId are 2, 4, 3.
# [1] FALSE
xts::isOrdered(.index(xtsWrite))  # Expect TRUE, index should be unique and increasing
# [1] TRUE

So I don't see anything wrong with the behavior of make.index.unique() or isOrdered(). Or am I missing something?

ckatsulis commented 7 years ago

Good catch, orderData$DesiredTime is supposed to be POSIXct, so I'll blame that on me. User error! Thanks for your help sir!