runCov - performance improvement by removing unnecessary code

harvey131 commented 5 years ago

I believe the performance of runCov() could be improved,particularly in terms of memory usage, if the following lines are removed.

The conversion with try.xts() seems unnecessary, and the xy <- cbind() is unnecessary since the "xy" variable is never used.

"runCov" <- function(x, y, n=10, use="all.obs", sample=TRUE, cumulative=FALSE) {

  x <- try.xts(x, error=as.matrix)
  y <- try.xts(y, error=as.matrix)
  if(is.xts(x) && is.xts(y)) {
    xy <- cbind(x,y)
  } else {
    xy <- cbind( as.vector(x), as.vector(y) )
  }
  ...

I can submit a pull request, but I have validated that deleting the rows will pass the unit tests.

joshuaulrich commented 5 years ago

Thanks for the suggestion! I appreciate you taking the time to look through the source code to find potential improvements!

I looked at the historical changes of that file, and I think the fact that xy is not used might be a regression. See the state of the function here.

All TTR functions attempt to coerce to xts, so they can support any xtsible time series class. The cbind() call will merge the two series, which then allows us to check the alignment of the observations, and also ensure there are no non-leading NA values in the combined object.

For example, this fails using the version of TTR on CRAN because the combined object has non-leading NA:

x <- .xts(1:10, 1:10)
y <- .xts(10:1, 5:14)
# expect error due to non-leading NA
runCov(x, y)
# Error in `[.xts`(x, beg:NROW(xy)) : subscript out of bounds

The error is correct behavior, but the message could certainly be better.

So I don't think the try.xts() calls should be removed. I'm not sure cbind() call could be more efficient either. Assuming x and y are single-column objects (or vectors), we might be able to do something like this to check for non-leading NA values of the combined object:

xts:::naCheck(merge(x, .xts(,.index(y))))
# Error in xts:::naCheck(merge(x, .xts(, .index(y)))) : 
#   Series contains non-leading NAs

That would cut memory use in half, but I'm not sure the added complexity is worth it.

harvey131 commented 5 years ago

Thanks. The try.xts() issue seems like it just needs unit tests that fails if either of those lines are commented out.

Is your plan to proceed with the 'c' version of runCov and delete the unnecessary xy variable... or to revert back to the fortran version that used 'xy'?

The naCheck would only required if both x and y are xts objects. If the user passes x and y as numeric vectors, the cbind is consuming memory as try.xts converts them to a matrix, then checks they are uni-variate, and coerces them back to a vector. I think if x and y are numeric vectors they could go to the c function directly if they are equal lengths. Similarly if they are matrices of the same NROW with NCOL=1.

harvey131 commented 5 years ago

Additional test cases have been added as a pull request 79

What is the expected behavior in this case?

  # x is xts, y is xts with a different index than x
  y <- xts::as.xts(input$all)$Low
  xts::.index(y) <- xts::.index(y) + 1
  checkEqualsNumeric(runCov(xts::as.xts(input$all)$High, y), output$allCov)

joshuaulrich commented 5 years ago

Sorry for my very slow reply. I've been exceptionally busy at work.

Is your plan to proceed with the 'c' version of runCov and delete the unnecessary xy variable... or to revert back to the fortran version that used 'xy'?

I plan to proceed with the C version. I want to avoid using .Fortran() because it always copies the input objects, unlike the .Call() interface.

The naCheck would only required if both x and y are xts objects. If the user passes x and y as numeric vectors, the cbind is consuming memory as try.xts converts them to a matrix, then checks they are uni-variate, and coerces them back to a vector. I think if x and y are numeric vectors they could go to the c function directly if they are equal lengths. Similarly if they are matrices of the same NROW with NCOL=1.

Great points, and I agree.

Additional test cases have been added as a pull request 79.

I think this PR looks good, but before I merge, could you explain the rationale for the changes to this one test case? Do you think similar changes to other tests would be beneficial?

harvey131 commented 5 years ago

The PR adds test cases for x and y as objects that are xtsible. None of the current test cases use xtsible objects.

I think it's important to add these before submitting another pull request that would optimize performance if x and y are numeric to make sure we dont break any existing functionality.

I havent looked at the other tests to see if they are passing in objects that are xtsible. Going back an earlier point for each of the functions, we should see unit tests fail if we comment out the try-xts() calls in the beginning of the functions.

harvey131 commented 5 years ago

Would you be OK with a pull request to optimize performance where x,y are numeric? As a separate pull request of together? It is somewhat independent of the existing runit tests only using vectors and not xtsible objects that is in the existing pull request.

joshuaulrich commented 3 years ago

I looked at the historical changes of that file, and I think the fact that xy is not used might be a regression.

I just took another look at this, and it does look like a regression. The example in my initial comment errors due to non-leading NA if I change the .Call() to use x = xy[,1] and y = xy[,2], and that is the correct behavior. I'm going to make that change and close this.

joshuaulrich / TTR

runCov - performance improvement by removing unnecessary code #78