joshuaulrich / xts

Extensible time series class that provides uniform handling of many R time series classes by extending zoo.
http://joshuaulrich.github.io/xts/
GNU General Public License v2.0
220 stars 71 forks source link

Add unclass() to findInterval call in window_idx #296

Closed harvey131 closed 5 years ago

harvey131 commented 5 years ago

Description

The performance of window_idx() should be improved if youunclass() on the POSIXct vectors before calling findInterval(). The anyNA() and is.unsorted() calls in findInterval() are faster on vectors of numeric than vectors of POSIXct due to the ALTREP support for numeric vectors.

Line 284 of xts.methods.R:

    # Fast search on index., faster than binsearch if index. is sorted (see findInterval)
    base_idx <- findInterval(index., idx)
    # Fast search on index., faster than binsearch if index. is sorted (see findInterval)
    base_idx <- findInterval(unclass(index.), unclass(idx))

Expected behavior

No change in output

Minimal, reproducible example

# generate vectors of POSIXct
x <- structure( c(123.5,1.1,5.01,121.9,200,3,-13.1,11111) , class = c("POSIXct", "POSIXt"), tzone = '')
vec <- structure( -1e6:1e6L , class = c("POSIXct", "POSIXt"), tzone = '') + 1e-3

microbenchmark::microbenchmark(
  findInterval(x, vec),
  findInterval(x, unclass(vec)),
  findInterval(unclass(x), unclass(vec)),
  check = 'identical')

Unit: milliseconds
                                   expr      min       lq      mean    median        uq      max neval cld
                   findInterval(x, vec) 9.348825 9.530721 13.199000 10.105917 11.758661 44.55282   100   b
          findInterval(x, unclass(vec)) 6.261441 6.526587  9.871345  7.126729  8.873130 32.45702   100  a 
 findInterval(unclass(x), unclass(vec)) 6.284819 6.522024  9.180107  7.064291  8.942411 27.19458   100  a 
joshuaulrich commented 5 years ago

Thanks for this suggestion, and your email to R-devel about the root cause.