runPercentRank what does parameter "exact.multiplier = 0.5" do?

AndreMikulec commented 3 years ago

Joshua,

I am trying to understand the parameter "exact.multiplier = 0.5" in runPercentRank. What does it do?

Would you mind explaining, in one paragraph of words, of the logic on how runPercentRank, is efficient and works?

I am thinking of trying to parallelized it (if possible).

Thanks, Andre

braverock commented 3 years ago

runPercentRank uses C code, so trying to rewrite it in R to "parallelize" it is almost certain to make it slower. rolling calculations are always performed in loops, and loops are faster in C.

I suppose it could be possible to use an AVX implementation to vectorize parts of it after shift operations, but I suspect that copies would cost more time, and blow up the RAM usage.

AndreMikulec commented 3 years ago

Actually, I have the idea to rewrite it (actually more-enhance it) in C or Fortran. (I can do that myself). I am just looking for a high level, very quick 30 seconds overview, of the logic behind the code. The code looks great: very view iterations, and very little memory usage.

joshuaulrich commented 3 years ago

Note that runPercentRank() was originally written in Fortran. I moved it to C because .Call() is preferred over .Fortran() in R.

Here's what I have in my notes for exact.multiplier:

exact.multiplier = 0 basically scores exactly identical values found in the lookback window as being greater than the value being ranked. Setting exact.multiplier = 1 ranks identical values as being purely below the value being ranked, and some multiplier between 0 and 1 counts exact.multiplier * N[identical vals] (where N is the window size) as being below the value being ranked.

This parameter matters most when the window is relatively small or when the number of discrete values in the window is small. For non-repeating values, changing exact.multiplier = 0 to exact.multiplier = 1 for a window of size N will shift the resulting percentile rankings by 1/N. It's equivalent to switching from asking, "how many values are < this value" to "how many values are <= this value".

I'll get these into the documentation.

I'm not sure what you're working on that would benefit from running these calculations in parallel, but I'm always interested in making things faster. I guess there could be some benefit if the series was very long and/or the window is very large.

R$ x <- rnorm(1e7)
R$ system.time(p <- runPercentRank(x, 1000))
   user  system elapsed 
 31.281   0.084  31.377

Please let me know what you come up with. It may be worth porting some of your ideas to TTR.

AndreMikulec commented 3 years ago

Brian and Joshua,

Thanks for the information about "exact.multiplier".

An algorithm that is so very efficient, does not come around too often. If anyone remembers (or has) (yet) "any more" information about its logic, the new information would be nice to know.

joshuaulrich / TTR

runPercentRank what does parameter "exact.multiplier = 0.5" do? #107