jwood000 / RcppAlgos

Tool for Solving Problems in Combinatorics and Computational Mathematics
GNU General Public License v2.0
45 stars 5 forks source link

expose nth results to R #2

Closed randy3k closed 6 years ago

randy3k commented 6 years ago

Hi @jwood000,

Thank you for your excellent job. I love the sampling functions comboSample and permSample. Those are the things I wanted to implement for a long time. Though I don't want to admit, your package is far more "useful" than arrangements 👍. Now, the only time I need to call arrangements is when I need to compute the numbers of permutations and combinations with gmp big integer (it may happens if n is large).

Do you have any plans to also expose the nth results to R ? I believe there is a significant overhead to call RcppAlgos::comboGeneral(10, 3, lower = 3, upper = 3) when we just need nthCombination.

jwood000 commented 6 years ago

Hey @randy3k

First off, thanks for the kind words. It is difficult to know sometimes if what you have produced is useful or not. Of course, we can look at number of downloads and things like that, but nothing eclipses a word of approval from a true expert such as yourself.

Now about exposing the nth result, you are correct, there is a bit of overhead when retrieving the nth result in the manner you present.

Short version

This has already been implemented in the sampling functions by making use of the sampleVec argument. See the example code at the bottom.

The Long Version

Before I released version 2.0.0, I initially had an nthCombination and an nthPermutation function precisely for retrieving specific nth results. However, as development evolved and the sampling functions emerged, it made more sense to me to incorporate this capability into the sampling functions.

Underneath these sampling functions, we generate a vector of random indices using the sample function, passing the total number of combinations/permutations to the x parameter. This vector of random integers representing specific nth results is then used to retrieve those specific results. The input is sanitized once and passed to the SampleResults function that retrieves each nth result unobstructed (whereas with the General functions (i.e. combo/permuteGeneral), you would have to sanitize the input for every nth result requested).

This naturally extends to simply passing a vector of indices (without relying on sample) and thus exposing the nth results. Hopefully this makes sense and please do not hesitate to reply if further explanation is needed. Here is an example from the README file of how it can be used:

library(RcppAlgos)

## Use sampleVec to generate specific results
## E.g. the below generates the 1st, 5th, 25th, 125th, and
## 625th lexicographical combinations
comboSample(10, 8, TRUE, sampleVec = c(1, 5, 25, 125, 625))
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    1    1    1    1    1    1    1
[2,]    1    1    1    1    1    1    1    5
[3,]    1    1    1    1    1    1    3    8
[4,]    1    1    1    1    1    3    6    9
[5,]    1    1    1    1    5    6   10   10

## Is the same as:
comboGeneral(10, 8, TRUE)[c(1, 5, 25, 125, 625), ]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    1    1    1    1    1    1    1
[2,]    1    1    1    1    1    1    1    5
[3,]    1    1    1    1    1    1    3    8
[4,]    1    1    1    1    1    3    6    9
[5,]    1    1    1    1    5    6   10   10

## Is the same as:
do.call(rbind, lapply(c(1, 5, 25, 125, 625), function(x) {
    comboGeneral(10, 8, TRUE, lower = x, upper = x)
}))
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    1    1    1    1    1    1    1
[2,]    1    1    1    1    1    1    1    5
[3,]    1    1    1    1    1    1    3    8
[4,]    1    1    1    1    1    3    6    9
[5,]    1    1    1    1    5    6   10   10

Benchmarks:

Below we have a benchmark that illustrates the overhead you point out when obtaining the nth results through the General functions:

library(microbenchmark)
options(scipen = 999)
set.seed(101)

mySamp <- sample(comboCount(50, 30), 10000)

a <- comboSample(50, 30, sampleVec = mySamp)
b <- lapply(mySamp, function(x) comboGeneral(50, 30, lower = x, upper = x))

all.equal(a, do.call(rbind, b))
[1] TRUE

microbenchmark(t1 = lapply(mySamp, function(x) comboGeneral(50, 30, lower = x, upper = x)),
               t2 = comboSample(50, 30, sampleVec = mySamp))
Unit: milliseconds
 expr      min        lq     mean    median        uq       max neval
   t1 95.86454 136.75282 154.9306 155.53314 171.89110 233.15580   100
   t2 13.66146  14.50935  15.2127  15.04008  15.86957  18.27242   100

We have more than an order improvement (i.e. 10x faster)!

Any further improvements in efficiency should be mostly focused on improving the algorithms in the NthResults source file. I am sure there is room for improvement as I developed these rather organically.

Thanks again! Joseph

randy3k commented 6 years ago

Oh, thanks. I totally missed the sampleVec argument.

randy3k commented 6 years ago

By the way, thanks for the benchmark.! 👍