rstudio-education / hopr

This is the development site for Hands-On Programming with R, a book that teaches how to program in R, with hands-on examples. Read the book at https://rstudio-education.github.io/hopr.
Other
254 stars 331 forks source link

Faster and simpler vectorised prize calculation #67

Open matomatical opened 1 year ago

matomatical commented 1 year ago

I took the advice in the book to implement a faster vectorised score-calculation function using rowSums and vectorised selective updates. The method of decomposing the problem differs from that in the book. Namely, wild diamonds are handled by adding the diamond count to each check, as, per the problem description, the diamonds can be thought of as any other symbol (an exception is made for the cherry cases).

score.count.fast <- function(symbols) {
    # counts of symbols in each sample
    dd <- rowSums(symbols == "DD")
    x7 <- rowSums(symbols == "7")
    b3 <- rowSums(symbols == "BBB")
    b2 <- rowSums(symbols == "BB")
    b1 <- rowSums(symbols == "B")
    cc <- rowSums(symbols == "C")
    # calculate prize with selective updating (higher prizes later to override lower ones)
    prize = integer(nrow(symbols)) # defaults to a number of 0s
    prize[cc == 1]                  <-   2
    prize[cc > 0 & cc + dd == 2]    <-   5
    prize[b3 + b2 + b1 + dd == 3]   <-   5
    prize[cc + dd == 3]             <-  10
    prize[b1 + dd == 3]             <-  10
    prize[b2 + dd == 3]             <-  25
    prize[b3 + dd == 3]             <-  40
    prize[x7 + dd == 3]             <-  80
    prize[dd == 3]                  <- 100
    # apply diamonds doubling effect
    prize * (2 ^ dd)
}

The resulting function is about 3x shorter and, on my machine, about 3x faster than the book's example solution. In my opinion, it's also significantly simpler in the way it handles wild symbols, and I think the method could also easily be generalised to new similar sets of rules without significant alteration. A branch-based, non-vectorised version is also possible, which is of similar complexity and speed to the book's example (but it doesn't serve as a good demonstration of vector lookups).

 score.count <- function(symbols) {
    # count symbols
    dd <- sum(symbols == "DD")
    x7 <- sum(symbols == "7")
    b3 <- sum(symbols == "BBB")
    b2 <- sum(symbols == "BB")
    b1 <- sum(symbols == "B")
    cc <- sum(symbols == "C")
    # calculate prize (higher prizes detected before lower ones)
    if (dd == 3) {
        prize <- 100
    } else if (x7 + dd == 3) {
        prize <- 80
    } else if (b3 + dd == 3) {
        prize <- 40
    } else if (b2 + dd == 3) {
        prize <- 25
    } else if (b1 + dd == 3) {
        prize <- 10
    } else if (cc + dd == 3) {
        prize <- 10
    } else if (b3 + b2 + b1 + dd == 3) {
        prize <- 5
    } else if (cc > 0 && cc + dd == 2) {
        prize <- 5
    } else if (cc == 1) {
        prize <- 2
    } else {
        prize <- 0
    }
    prize * (2 ^ dd)

I tested both versions and they give identical output to the book's code for all combinations of symbols. An rmarkdown script for testing and for profiling with microbenchmark is attached:

This issue serves three purposes

  1. Post the alternative method here for others to learn from if interested
  2. Contribute the simpler method in case the author would like to adapt it for use in future versions of the book
  3. Invite others to further improve the efficiency of the method and share their approaches and results

My thanks to the author and all contributors for their work on making an excellent open source book that helped me efficiently learn the fundamentals of R.