Closed AndreMikulec closed 3 years ago
runPercentRank
uses C code, so trying to rewrite it in R to "parallelize" it is almost certain to make it slower. rolling calculations are always performed in loops, and loops are faster in C.
I suppose it could be possible to use an AVX implementation to vectorize parts of it after shift operations, but I suspect that copies would cost more time, and blow up the RAM usage.
Actually, I have the idea to rewrite it (actually more-enhance it) in C or Fortran. (I can do that myself). I am just looking for a high level, very quick 30 seconds overview, of the logic behind the code. The code looks great: very view iterations, and very little memory usage.
Note that runPercentRank()
was originally written in Fortran. I moved it to C because .Call()
is preferred over .Fortran()
in R.
Here's what I have in my notes for exact.multiplier
:
exact.multiplier = 0
basically scores exactly identical values found in the lookback window as being greater than the value being ranked. Settingexact.multiplier = 1
ranks identical values as being purely below the value being ranked, and some multiplier between 0 and 1 countsexact.multiplier * N[identical vals]
(whereN
is the window size) as being below the value being ranked.This parameter matters most when the window is relatively small or when the number of discrete values in the window is small. For non-repeating values, changing
exact.multiplier = 0
toexact.multiplier = 1
for a window of sizeN
will shift the resulting percentile rankings by1/N
. It's equivalent to switching from asking, "how many values are<
this value" to "how many values are<=
this value".
I'll get these into the documentation.
I'm not sure what you're working on that would benefit from running these calculations in parallel, but I'm always interested in making things faster. I guess there could be some benefit if the series was very long and/or the window is very large.
R$ x <- rnorm(1e7)
R$ system.time(p <- runPercentRank(x, 1000))
user system elapsed
31.281 0.084 31.377
Please let me know what you come up with. It may be worth porting some of your ideas to TTR.
Brian and Joshua,
Thanks for the information about "exact.multiplier".
An algorithm that is so very efficient, does not come around too often. If anyone remembers (or has) (yet) "any more" information about its logic, the new information would be nice to know.
Joshua,
I am trying to understand the parameter "exact.multiplier = 0.5" in runPercentRank. What does it do?
Would you mind explaining, in one paragraph of words, of the logic on how runPercentRank, is efficient and works?
I am thinking of trying to parallelized it (if possible).
Thanks, Andre