randy3k / collections

High-performance container datatypes for R
https://randy3k.github.io/collections
Other
103 stars 3 forks source link

Consider vector based stack for benchmarking vignette #23

Open TimTaylor opened 1 year ago

TimTaylor commented 1 year ago

For completeness would it be worth including a vector based stack for comparison in the benchmarking vignette? The performance of one based on Martin Morgan's suggestion on stackoverflow is better than a list/environment based one.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(forcats)
library(ggplot2)
library(collections)
#> 
#> Attaching package: 'collections'
#> The following object is masked from 'package:utils':
#> 
#>     stack

# vector based stack (based on https://stackoverflow.com/a/18678440)
vec_stack <- function(type="double", length=1000L) {
    v <- vector(type, length)
    i <- 1L
    len <- length(v)
    list(
        push = function(elt) {
            if (typeof(elt) != type)
                stop("types must match")
            if (i == len) {
                length(v) <<- 1.6 * len
                len <<- length(v)
            }
            v[[i]] <<- elt
            i <<- i + 1L
        },
        pop = function() {
            i <<- i - 1L
            v[[i]]
        },
        clear = function() {
            v <<- vector(type, length)
        }
    )
}

# list based stack from benchmark vignette
list_stack <- function() {
    self <- environment()
    q <- NULL
    n <- NULL
    push <- function(item) {
        if (is.null(item)) {
            q[n + 1] <<- list(item)
        } else {
            q[[n + 1]] <<- item
        }
        n <<- n + 1
        invisible(self)
    }
    pop <- function() {
        if (n == 0) stop("stack is empty")
        v <- q[[n]]
        q <<- q[-n]
        n <<- n - 1
        v
    }
    clear <- function() {
        q <<- list()
        n <<- 0
        invisible(self)
    }
    clear()
    self
}

# bench mark based on one in vignette (slightly extended n)
bench_stack <- bench::press(
    n = c(10, 50, 100, 200, 500, 1000),
    bench::mark(
        `base::list_stack_grow` = {
            q <- list_stack()
            x <- rnorm(n)
            for (i in 1:n) q$push(x[i])
            for (i in 1:n) q$pop()
        },
        `base::vec_stack_pre_allocate` = {
            q <- vec_stack(length = n)
            x <- rnorm(n)
            for (i in 1:n) q$push(x[i])
            for (i in 1:n) q$pop()
        },
        `base::vec_stack_grow` = {
            q <- vec_stack(length = 2L)
            x <- rnorm(n)
            for (i in 1:n) q$push(x[i])
            for (i in 1:n) q$pop()
        },
        `collections::stack` = {
            q <- stack()
            x <- rnorm(n)
            for (i in 1:n) q$push(x[i])
            for (i in 1:n) q$pop()
        },
        check = FALSE
    )
) |> 
    mutate(expression = fct_reorder(
        as.character(expression), median, .fun = mean, .desc = TRUE))
#> Running with:
#>       n
#> 1    10
#> 2    50
#> 3   100
#> 4   200
#> 5   500
#> 6  1000

# plot
bench_stack %>%
    ggplot(aes(x = n, y = median)) +
    geom_line(aes(color = expression)) +
    scale_colour_brewer(palette = "Set2", direction = -1) +
    ggtitle("push and pop n times") + ylab("time")

Created on 2023-01-04 with reprex v2.0.2

randy3k commented 1 year ago

The vector based stack works so much better now by not making copies with each append (something in 3.4.x?) If you really want to use it in practice, just make sure you trim the vector length when it is shortened enough, say the length is less than 0.5 * the preallocated length.

Feel free to submit the change for the vignette, I will render the site once the update is merged.

TimTaylor commented 1 year ago

Cool. Will try and PR something over the next couple of weeks.