moodymudskipper / reactibble

Use Dynamic Columns in Data Frames
40 stars 1 forks source link

memoisation #11

Open moodymudskipper opened 3 years ago

moodymudskipper commented 3 years ago

This will make a lot of sense after https://github.com/moodymudskipper/reactibble/issues/10

A big con of this package as it is is that it recomputes the columns often, we could use {memoise} in these cases.

Maybe memoisation needs to be at the core of the package, maybe it needs to be opt in.

We can technically already use it of course, but I'd like it to be seamless, I don't want the memoising function to hang out in the environment, If we did define a rowwise memoized computation it would look like this (let's ignore that bmi() is vectorized below):

bmi<- function(h, w) h/w^2
bmi_m <- memoise(bmi)
mutate.reactibble(data, bmi= ~ mapply(bmi_m, height_col, weight_col))

We can probably do something nicer with a bit of work.

This could work, at the first evaluation we recognize we have a memoising function when evaluating the quosure, we store this function as the attribute instead of the quosure and use it to refresh from there on :

mutate.reactibble(data, bmi= ~ memoise(function(height_col , weight_col ) height_col / weight_col ^2))

We didn't provide the row wise / not row wise info though.

It's a bit tedious to type though, with some metaprogramming we could assume that ... means "every column", and write :

mutate.reactibble(data, bmi= ~ memoise(function(...) height_col / weight_col ^2))

That's piling up weirdness though. Something more elegant would be :

mutate.reactibble(data, bmi= m ~ height_col / weight_col ^2))

and from there the code would build the real memoised function and set it as the column definition attribute.

rowwise information was still not given.

Would be nice to find something not too weird. At first a proper example similar to the first chunk above would go a long way.

moodymudskipper commented 3 years ago

Here's a simple {memoise} where the function is applied on the full column. It's in 2 parts, in the first one we use a function we just memoised. In the second the {reactibble} was built in another environment, so the memoised function is not in local env, but we see it's ok, because we evaluate the expression first in the data frame, then in the environment where it was defined.

library(memoise)
library(reactibble)
library(dplyr, warn.conflicts = FALSE)
f <- function(x) {
  Sys.sleep(1)
  mean(x)
}
mf <- memoise(f)

system.time(
  rt <- reactibble(a=1:3, b = ~mf(a))
)
#>    user  system elapsed 
#>    0.02    0.00    1.03
rt
#> # A reactibble: 3 x 2
#>       a      b
#>   <int> <~dbl>
#> 1     1      2
#> 2     2      2
#> 3     3      2

system.time(
  rt <- mutate(rt, c = ~ b * 2)
)
#>    user  system elapsed 
#>    0.06    0.00    0.06
rt
#> # A reactibble: 3 x 3
#>       a      b      c
#>   <int> <~dbl> <~dbl>
#> 1     1      2      4
#> 2     2      2      4
#> 3     3      2      4
build_rt <- function() {
  f <- function(x) {
    Sys.sleep(1)
    max(x)
  }
  mf <- memoise(f)
  reactibble(a=1:3, b = ~mf(a))
}

system.time(
  rt2 <- build_rt()
)
#>    user  system elapsed 
#>    0.00    0.00    1.01
rt2
#> # A reactibble: 3 x 2
#>       a      b
#>   <int> <~int>
#> 1     1      3
#> 2     2      3
#> 3     3      3

system.time(
  rt2 <- mutate(rt2, c = ~ b * 2)
)
#>    user  system elapsed 
#>    0.01    0.00    0.02
rt2
#> # A reactibble: 3 x 3
#>       a      b      c
#>   <int> <~int> <~dbl>
#> 1     1      3      6
#> 2     2      3      6
#> 3     3      3      6
moodymudskipper commented 3 years ago

If we want to memoise rowwise we can do as follow, we use a memoised function in mapply and rowwise operations will be cached:

library(memoise)
library(reactibble)

f <- function(...) {
  Sys.sleep(1)
  mean(c(...))
}

mf <- memoise(f)

system.time(
  rt <- reactibble(a=1:3, b = 2:4, c = ~ mapply(mf,a, b))
)
#>    user  system elapsed 
#>    0.09    0.00    3.16
rt
#> # A reactibble: 3 x 3
#>       a     b      c
#>   <int> <int> <~dbl>
#> 1     1     2    1.5
#> 2     2     3    2.5
#> 3     3     4    3.5

system.time(
  rt$a[[3]] <- 10L
)
#>    user  system elapsed 
#>    0.01    0.00    1.03
rt
#> # A reactibble: 3 x 3
#>       a     b      c
#>   <int> <int> <~dbl>
#> 1     1     2    1.5
#> 2     2     3    2.5
#> 3    10     4    7
moodymudskipper commented 3 years ago

I really want to be able to memoise on the go though...

If we just call rt <- reactibble(a=1:3, b = 2:4, c = ~ mapply(memoise(f),a, b)) it doesn't work because memoise is called at every call that triggers a refresh.

I believe we could have a function M, usable only in the expr, it would be caught at the preprocessing step through metaprogramming, would store the memoised function somewhere so it would be found by the M(f) call when evaluating.

We have a challenge for the evaluation though:

I think it'll work quite nicely