memoisation - Githubissues

moodymudskipper commented 3 years ago

This will make a lot of sense after https://github.com/moodymudskipper/reactibble/issues/10

A big con of this package as it is is that it recomputes the columns often, we could use {memoise} in these cases.

We can memoise a column computation, so it's not recomputed when the inputs didn't change.
But we can go further and memoise some row wise computations, for instance if we have a column of models, we don't want to recompute everything, we might add one row or modify one, and no extra computation needs to be done.

Maybe memoisation needs to be at the core of the package, maybe it needs to be opt in.

We can technically already use it of course, but I'd like it to be seamless, I don't want the memoising function to hang out in the environment, If we did define a rowwise memoized computation it would look like this (let's ignore that bmi() is vectorized below):

bmi<- function(h, w) h/w^2
bmi_m <- memoise(bmi)
mutate.reactibble(data, bmi= ~ mapply(bmi_m, height_col, weight_col))

We can probably do something nicer with a bit of work.

This could work, at the first evaluation we recognize we have a memoising function when evaluating the quosure, we store this function as the attribute instead of the quosure and use it to refresh from there on :

mutate.reactibble(data, bmi= ~ memoise(function(height_col , weight_col ) height_col / weight_col ^2))

We didn't provide the row wise / not row wise info though.

It's a bit tedious to type though, with some metaprogramming we could assume that ... means "every column", and write :

mutate.reactibble(data, bmi= ~ memoise(function(...) height_col / weight_col ^2))

That's piling up weirdness though. Something more elegant would be :

mutate.reactibble(data, bmi= m ~ height_col / weight_col ^2))

and from there the code would build the real memoised function and set it as the column definition attribute.

rowwise information was still not given.

We might store it on the left too, so we could have m (memoise), rwm (row wise memoise) and rw (simple row wise).
We coud have a helper on the rhs, that could be generalized to grouping (e.g. we provide split variables, if none are provided it's rowwise, or 2 different verbs rw and grp

Would be nice to find something not too weird. At first a proper example similar to the first chunk above would go a long way.

moodymudskipper commented 3 years ago

Here's a simple {memoise} where the function is applied on the full column. It's in 2 parts, in the first one we use a function we just memoised. In the second the {reactibble} was built in another environment, so the memoised function is not in local env, but we see it's ok, because we evaluate the expression first in the data frame, then in the environment where it was defined.

library(memoise)
library(reactibble)
library(dplyr, warn.conflicts = FALSE)
f <- function(x) {
  Sys.sleep(1)
  mean(x)
}
mf <- memoise(f)

system.time(
  rt <- reactibble(a=1:3, b = ~mf(a))
)
#>    user  system elapsed 
#>    0.02    0.00    1.03
rt
#> # A reactibble: 3 x 2
#>       a      b
#>   <int> <~dbl>
#> 1     1      2
#> 2     2      2
#> 3     3      2

system.time(
  rt <- mutate(rt, c = ~ b * 2)
)
#>    user  system elapsed 
#>    0.06    0.00    0.06
rt
#> # A reactibble: 3 x 3
#>       a      b      c
#>   <int> <~dbl> <~dbl>
#> 1     1      2      4
#> 2     2      2      4
#> 3     3      2      4

build_rt <- function() {
  f <- function(x) {
    Sys.sleep(1)
    max(x)
  }
  mf <- memoise(f)
  reactibble(a=1:3, b = ~mf(a))
}

system.time(
  rt2 <- build_rt()
)
#>    user  system elapsed 
#>    0.00    0.00    1.01
rt2
#> # A reactibble: 3 x 2
#>       a      b
#>   <int> <~int>
#> 1     1      3
#> 2     2      3
#> 3     3      3

system.time(
  rt2 <- mutate(rt2, c = ~ b * 2)
)
#>    user  system elapsed 
#>    0.01    0.00    0.02
rt2
#> # A reactibble: 3 x 3
#>       a      b      c
#>   <int> <~int> <~dbl>
#> 1     1      3      6
#> 2     2      3      6
#> 3     3      3      6

moodymudskipper commented 3 years ago

If we want to memoise rowwise we can do as follow, we use a memoised function in mapply and rowwise operations will be cached:

library(memoise)
library(reactibble)

f <- function(...) {
  Sys.sleep(1)
  mean(c(...))
}

mf <- memoise(f)

system.time(
  rt <- reactibble(a=1:3, b = 2:4, c = ~ mapply(mf,a, b))
)
#>    user  system elapsed 
#>    0.09    0.00    3.16
rt
#> # A reactibble: 3 x 3
#>       a     b      c
#>   <int> <int> <~dbl>
#> 1     1     2    1.5
#> 2     2     3    2.5
#> 3     3     4    3.5

system.time(
  rt$a[[3]] <- 10L
)
#>    user  system elapsed 
#>    0.01    0.00    1.03
rt
#> # A reactibble: 3 x 3
#>       a     b      c
#>   <int> <int> <~dbl>
#> 1     1     2    1.5
#> 2     2     3    2.5
#> 3    10     4    7

moodymudskipper commented 3 years ago

I really want to be able to memoise on the go though...

If we just call rt <- reactibble(a=1:3, b = 2:4, c = ~ mapply(memoise(f),a, b)) it doesn't work because memoise is called at every call that triggers a refresh.

I believe we could have a function M, usable only in the expr, it would be caught at the preprocessing step through metaprogramming, would store the memoised function somewhere so it would be found by the M(f) call when evaluating.

We have a challenge for the evaluation though:

we need to create a new env (memoise_env) containing M and a named list of memoised functions. Its parent will be the quosure environment, and this is where the expression will be evaluated.
M is then a simple function that looks up the memoised functions in the list
The list has an obscure names that we won't conflict with, like ..memoised_fun_list.., if we want to ensure no chance of conflict we can set it as an attribute to M then we'll need extra weirdness that I don't think is necessary.

I think it'll work quite nicely

moodymudskipper / reactibble

memoisation #11