r-lib / R6

Encapsulated object-oriented programming for R
https://R6.r-lib.org
Other
405 stars 56 forks source link

[Bug?] Unnecessary memory copy #106

Closed dselivanov closed 7 years ago

dselivanov commented 7 years ago

I faced unexpected behavior with R6 classes. For some reason during update of internal fields R makes unnecessary copy of the field (where modification should be in place).

library(R6)
library(microbenchmark)
############################################
# NO COPY
############################################
r6_memcp_1 = R6::R6Class(
  "memcp1",
  public = list(
    initialize = function(k, do_tracemem = TRUE) {
      private$arr = numeric(k)
      if(do_tracemem)
        tracemem(private$arr)

    }, 
    update = function(index, value) {
      private$arr[index] = value
    }, 
    get = function() private$arr
  ), 
  private = list(
    arr = NULL
  )
)
############################################
# COPY
############################################
r6_memcp_2 = R6::R6Class(
  "memcp2",
  public = list(
    initialize = function(k, do_tracemem = TRUE) {
      private$arr = numeric(k)
      if(do_tracemem)
        tracemem(private$arr)
    }, 
    update = function(index, value) {
      # This causes memory copy
      temp = private$arr[index]
      #
      private$arr[index] = value
    }, 
    get = function() private$arr
  ), 
  private = list(
    arr = NULL
  )
)
############################################
# NO COPY
############################################
closure_memcp = function(k, do_tracemem = TRUE) {
  arr = numeric(k)
  if(do_tracemem)
    tracemem(arr)
  update = function(index, value) {
    # This NOT causes memory copy
    temp = arr[index]
    arr[index] <<- value
  }
  get = function() arr
  list(update = update, get = get)
}
############################################
N = 1e6
c1 = r6_memcp_1$new(N)
c2 = r6_memcp_2$new(N)
c3 = closure_memcp(N)

index = sample(N, 100)
value = 1.0
microbenchmark(
  c1$update(index, value),
  c2$update(index, value),
  c3$update(index, value),
times = 10)

identical(c1$get(), c2$get())
identical(c1$get(), c3$get())

tracemem[0x13a3be000 -> 0x1167a2000]: microbenchmark tracemem[0x139c1c000 -> 0x116f44000]: microbenchmark tracemem[0x116f44000 -> 0x1176e6000]: microbenchmark tracemem[0x1176e6000 -> 0x117e88000]: microbenchmark tracemem[0x117e88000 -> 0x11862a000]: microbenchmark tracemem[0x11862a000 -> 0x118dcc000]: microbenchmark tracemem[0x118dcc000 -> 0x11956e000]: microbenchmark tracemem[0x11956e000 -> 0x119d10000]: microbenchmark tracemem[0x119d10000 -> 0x11a4b2000]: microbenchmark tracemem[0x11a4b2000 -> 0x11ac54000]: microbenchmark tracemem[0x11ac54000 -> 0x11b3f6000]: microbenchmark Unit: microseconds expr min lq mean median uq max neval c1$update(index, value) 4.061 7.156 562.3504 20.6545 24.778 5472.335 10 c2$update(index, value) 4170.111 4278.546 4806.9702 4905.7645 5197.733 5429.694 10 c3$update(index, value) 3.820 4.191 12.0522 11.0300 13.792 40.875 10

wch commented 7 years ago

This is a consequence of how R does subset assignment, as in a[1] <- 2. It creates a *tmp* variable. See:

Here's a minimal example:

# ----- Using `$<-` assignment ------
e <- new.env()
e$x <- 1
tracemem(e$x)
# [1] "<0x10d6a0af8>"

x_copy <- e$x
e$x <- 2        # No tracemem output

x_copy <- e$x
e$x <- 3        # No tracemem output

# ----- Using two levels of subset assignment ------
e <- new.env()
e$x <- 1
tracemem(e$x)
# [1] "<0x10d29e758>"

x_copy <- e$x[1]
e$x[1] <- 2
# tracemem[0x10d29e758 -> 0x10d29e9f8]: 

x_copy <- e$x[1]
e$x[1] <- 3
# tracemem[0x10d29e9f8 -> 0x10d29ebd8]: 

I believe that the byte compiler (which is going to be enabled by default on future versions of R, for packages on CRAN) avoids creating *tmp*, but I don't know for sure.