klmr / box

Write reusable, composable and modular R code
https://klmr.me/box/
MIT License
833 stars 47 forks source link

autoreload feature #234

Open grst opened 2 years ago

grst commented 2 years ago

While developing interactively, it would be fantastic if the modules are re-loaded automatically in the background every time a module has changed. This is how the %autoreload magic works in jupyter notebooks.

Use-case

1. Working on rmarkdown documents

I believe it is considered good practice to only have the code directly needed for generating the report within the rmarkdown document itself and to refactor helper functions into a separate file. Now, if I work on the document and have my helper functions loaded with, say box::use(./utils), I need to run reload(utils) every time I change a utility function.

2. Modular shiny apps

I found that box is a quite effective way to structure modular shiny apps. The only downside is that when hitting the "Reload App" button in Rstudio wil not reload any changes I made to the shiny modules I loaded with box. My current workaround is to have a reload(module) right underneath each use(module).

Suggested implementation

Enable autoreload by setting a global configuration flag, e.g.

box::set_autoreload(TRUE)

# or borrowing flags from ipython's %autoreload
box::set_autoreload(2)

I don't know how this would work under-the-hood, but probably watch all files from which functions have been imported and trigger box::reload every time a file has changed.

klmr commented 2 years ago

I like the idea. However, the implementation will be fairly complex, especially when avoiding dependencies on other packages. I’m not familiar with IPython’s %autoreloader functionality but a quick look at the source code reveals that its implementation is made much simpler because it can hook into IPython, and run every time the user executes a cell. In R we can’t do that (outside of RMarkdown there are no “cells”), so a general solution for ‘box’ would have to implement a filesystem watcher, which is hugely complex to do safely cross-platform.

Alternatively, it might be possible to create a solution that only works with RMarkdown Notebooks. Unfortunately RMarkdown, unlike IPython, does not seem to define a hook that is executed for every chunk (neither the option hooks nor the output hooks of ‘knitr’ are suitable for this) so this would first require changes to the ‘knitr’ package.

For Shiny, it might also be possible to hook into the “reload app” functionality to accomplish this. Again, I don’t have any experience with this (my only experience using Shiny is via the ‘golem’ framework, and that does not support RStudio’s “Reload App” functionality). I’m a bit puzzled how this functionality works anyway, it completely breaks the usual way in which R code is executed. And as far as I can see RStudio itself would have to add support to use this functionality, it’s not currently exposed to third-party packages.

So unless I’m overlooking something (entirely possible!) this would either require changes to RMarkdown/‘knitr’ and RStudio, or adding a platform-independent, dependency-free (vendored) filesystem watcher to ‘box’.

All in all, this probably won’t happen soon unless somebody contributed a suitable PR, unfortunately.


Separate from the above, there’s the other issue that box::reload currently (intentionally) only reloads a given module instance, but it does not reload attached names. This could certainly be changed but it would also require some effort. I’d almost consider that as an orthogonal feature.

grst commented 2 years ago

Thanks for your elaborate response!

I was mostly thinking along the lines of a file watcher. A general solution would ensure that it extends to other use-cases as well and does not depend on specific tooling (for instance, what if I prefer to develop Rmarkdown notebooks in vscode?).

Testthat seems to have a rather simple (though maybe a bit hacky) file system watcher entirely written in R: https://github.com/r-lib/testthat/blob/18b2c6c5cc8c4254da499d0d17ba4694faac1d1d/R/watcher.R

Do you think this is something that could work?

Separate from the above, there’s the other issue that box::reload currently (intentionally) only reloads a given module instance, but it does not reload attached names

This wouldn't be very important for me. Attaching entire namespaces is discouraged anyway, isn't it? Or would this also affect single functions imported like box::use(mymodule[foo])?

klmr commented 2 years ago

Unfortunately the busy polling technique used by ‘testthat’ can’t be used here because it’s synchronous. But I’m realising that we don’t actually need this: If (and only if!) the user activates auto-reloading, ‘box’ could record all file dependencies for a loaded module, and upon box::use it would simply walk all these dependencies, check their file modification timestamp, and unload them as necessary, to force reloading.

The implementation is still not trivial since ‘box’ currently doesn’t maintain such a detailed list of dependencies. But I think it’s feasible.


Or would [reloading] also affect single functions imported like box::use(mymodule[foo])?

Your example can’t be reloaded. To make it reloadable, we need to create an explicit module alias, e.g.

box::use(mymodule = ./mymodule[foo])

And now box::reload(mymodule) would reload the module, but foo would still refer to the old function definition (and any module-internal function or other dependency that foo uses would also refer to the old version).

grst commented 2 years ago

Unfortunately the busy polling technique used by ‘testthat’ can’t be used here because it’s synchronous.

I see!

But I’m realising that we don’t actually need this: If (and only if!) the user activates auto-reloading, ‘box’ could record all file dependencies for a loaded module, and upon box::use it would simply walk all these dependencies, check their file modification timestamp, and unload them as necessary, to force reloading.

This would solve the shiny issue, but not (or only to a certain extent) the rmarkdown use-case. If I use box::use(./mymodule) in the first cell and call mymodule$myfun() in the second cell, and mymodule.R gets modified, I still would need to re-run the first cell for the changes to become effective. With a file system watcher it would update in the background.

I got another idea how to solve this potentially that doesn't require a file system watcher (and also no list of dependencies). When the autoreload flag is set, all modules get replaced by "module wrappers" which always check for modifications and reload if necessary before providing access to an object. I am not sure how to implement this in R, or if this approach has other drawbacks, but here's a blueprint in Python to better illustrate what I mean:


class ModuleWrapper:
    def __init__(self, module_path):
         # `Module` represents the actual module (whatever box uses now)
         self.module = Module(module_path)

    def __getattr__(self, name):
         # Check if module has been modified and reload if necessary
         if self.module.was_modified:
              self.module.reload()
         # forward attribute access to actual module
         return getattr(self.module, name)
klmr commented 2 years ago

When the autoreload flag is set, all modules get replaced by "module wrappers" which always check for modifications and reload if necessary before providing access to an object.

Ah, that’s very clever. Yes, R might allow something similar, e.g. via active bindings. Alternatively, the $ operator itself could be rewritten correspondingly. I’ll give it a shot, but this’ll have to wait until after the next CRAN release, which will bundle the changes up until now.

GitHunter0 commented 2 years ago

My current workaround is to have a reload(module) right underneath each use(module).

Also, how about adding an option reload=TRUE in use() to make this operation in one single step?

klmr commented 2 years ago

@GitHunter0 The auto-reload mechanism that’s currently under development provides a better, more comprehensive solution. You can check the progress and discussion here: #246.

GitHunter0 commented 2 years ago

Thanks @klmr , this is great news!

nvelden commented 1 year ago

Did this function ever made it to the main repository? I don't see it when I install the latest version from: install.packages('box', repos = 'https://klmr.r-universe.dev')

klmr commented 1 year ago

@nvelden Unfortunately not yet, since I was hoping for more feedback from active users, both to gain confidence that there are no glaring bugs, and to get feedback on the usability of the functions.

That said, I have been testing the feature myself for a while now, and I haven’t discovered any bugs. At the same time, I now thing that I will probably change how the functionality is invoked, but I think it can be merged fairly soon.


In the future I would still like to reimplement this feature in a smarter way (using filesystem watchers) but such an implementation would be much more complex (it would probably more than double the overall size of the ‘box’ code base, and requires writing complex C code that targets separate operating systems).

nvelden commented 1 year ago

@klmr That would be great! Currently, I'm using box::purge_cache() in the console to clear the cache whenever I make any changes, before I restart the app. This seems to be effective, but having a feature that automatically does this within the Shiny app would certainly be a great improvement.

Btw. Thanks a lot for this fantastic package. It makes importing and documenting functions within Shiny apps so much easier. I am now implementing it in any Shiny app I build.