Open moodymudskipper opened 3 years ago
Here's a simple {memoise} where the function is applied on the full column. It's in 2 parts, in the first one we use a function we just memoised. In the second the {reactibble}
was built in another environment, so the memoised function is not in local env, but we see it's ok, because we evaluate the expression first in the data frame, then in the environment where it was defined.
library(memoise)
library(reactibble)
library(dplyr, warn.conflicts = FALSE)
f <- function(x) {
Sys.sleep(1)
mean(x)
}
mf <- memoise(f)
system.time(
rt <- reactibble(a=1:3, b = ~mf(a))
)
#> user system elapsed
#> 0.02 0.00 1.03
rt
#> # A reactibble: 3 x 2
#> a b
#> <int> <~dbl>
#> 1 1 2
#> 2 2 2
#> 3 3 2
system.time(
rt <- mutate(rt, c = ~ b * 2)
)
#> user system elapsed
#> 0.06 0.00 0.06
rt
#> # A reactibble: 3 x 3
#> a b c
#> <int> <~dbl> <~dbl>
#> 1 1 2 4
#> 2 2 2 4
#> 3 3 2 4
build_rt <- function() {
f <- function(x) {
Sys.sleep(1)
max(x)
}
mf <- memoise(f)
reactibble(a=1:3, b = ~mf(a))
}
system.time(
rt2 <- build_rt()
)
#> user system elapsed
#> 0.00 0.00 1.01
rt2
#> # A reactibble: 3 x 2
#> a b
#> <int> <~int>
#> 1 1 3
#> 2 2 3
#> 3 3 3
system.time(
rt2 <- mutate(rt2, c = ~ b * 2)
)
#> user system elapsed
#> 0.01 0.00 0.02
rt2
#> # A reactibble: 3 x 3
#> a b c
#> <int> <~int> <~dbl>
#> 1 1 3 6
#> 2 2 3 6
#> 3 3 3 6
If we want to memoise rowwise we can do as follow, we use a memoised function in mapply and rowwise operations will be cached:
library(memoise)
library(reactibble)
f <- function(...) {
Sys.sleep(1)
mean(c(...))
}
mf <- memoise(f)
system.time(
rt <- reactibble(a=1:3, b = 2:4, c = ~ mapply(mf,a, b))
)
#> user system elapsed
#> 0.09 0.00 3.16
rt
#> # A reactibble: 3 x 3
#> a b c
#> <int> <int> <~dbl>
#> 1 1 2 1.5
#> 2 2 3 2.5
#> 3 3 4 3.5
system.time(
rt$a[[3]] <- 10L
)
#> user system elapsed
#> 0.01 0.00 1.03
rt
#> # A reactibble: 3 x 3
#> a b c
#> <int> <int> <~dbl>
#> 1 1 2 1.5
#> 2 2 3 2.5
#> 3 10 4 7
I really want to be able to memoise on the go though...
If we just call rt <- reactibble(a=1:3, b = 2:4, c = ~ mapply(memoise(f),a, b))
it doesn't work because memoise
is called at every call that triggers a refresh.
I believe we could have a function M
, usable only in the expr, it would be caught at the preprocessing step through metaprogramming, would store the memoised function somewhere so it would be found by the M(f)
call when evaluating.
We have a challenge for the evaluation though:
memoise_env
) containing M
and a named list of memoised functions. Its parent will be the quosure environment, and this is where the expression will be evaluated. M
is then a simple function that looks up the memoised functions in the list..memoised_fun_list..
, if we want to ensure no chance of conflict we can set it as an attribute to M then we'll need extra weirdness that I don't think is necessary.I think it'll work quite nicely
This will make a lot of sense after https://github.com/moodymudskipper/reactibble/issues/10
A big con of this package as it is is that it recomputes the columns often, we could use {memoise} in these cases.
Maybe memoisation needs to be at the core of the package, maybe it needs to be opt in.
We can technically already use it of course, but I'd like it to be seamless, I don't want the memoising function to hang out in the environment, If we did define a rowwise memoized computation it would look like this (let's ignore that
bmi()
is vectorized below):We can probably do something nicer with a bit of work.
This could work, at the first evaluation we recognize we have a memoising function when evaluating the quosure, we store this function as the attribute instead of the quosure and use it to refresh from there on :
We didn't provide the row wise / not row wise info though.
It's a bit tedious to type though, with some metaprogramming we could assume that
...
means "every column", and write :That's piling up weirdness though. Something more elegant would be :
and from there the code would build the real memoised function and set it as the column definition attribute.
rowwise information was still not given.
m
(memoise),rwm
(row wise memoise) andrw
(simple row wise).rw
andgrp
Would be nice to find something not too weird. At first a proper example similar to the first chunk above would go a long way.