tempfile persistence - Githubissues

simonpcouch commented 2 years ago

always unlink() tempfiles in bundle methods
introduce testing infrastructure that would disrupt access to those tempfiles even if they did exist

juliasilge commented 2 years ago

HA it looks like the embed method was also serializing and unserializing the file path 🙊

Truly this cracks me up 😆

juliasilge commented 2 years ago

I have fixed all the bundle methods that use temp files in #42 but I am having a hard time coming up with a testing strategy that would have caught our previous problems. We can call from a new R session, but the temp files aren't deleted until the old, parent R session is ended. Could we do something via withr::defer()? Like add something like:

withr::defer(unbundle(mod_bundle))

to try to see if we can unbundle the object after the R session ends? That's not quite right, though, because it runs on the old broken bundlers (embed, keras, etc).

juliasilge commented 2 years ago

~~We could write out bundles to temporary .rds files and read them back in and try to unbundle them.~~ No, that doesn't help because then the other temp files are still there.

Maybe we should loop someone else in on this.

simonpcouch commented 2 years ago

Would it work to put together a quick helper to wipe the session's tempdir() before the callr:r() portion of the test? a la:

clean_tempdir <- function() {
  to_rm <- list.files(tempdir(), full.names = TRUE)
  unlink(to_rm, recursive = TRUE)
}

...though this does seem potentially invasive.

juliasilge commented 2 years ago

It does seem pretty invasive. I think I'll outline what we are running into and ask for some help on this.

mikemahoney218 commented 2 years ago

Sorry if this is missing something obvious, but is there any way to serialize the object in one callr session, and then try unserializing in another callr session?

If you have the first callr session return tempdir(), you could also unlink that entire directory to make sure you aren't accidentally persisting files from the original session.

juliasilge commented 2 years ago

Ah, use callr::r() twice! Let's give that a go.

juliasilge commented 2 years ago

This is worse than I thought it would be.

Consider three R sessions:

session 1 which is the original testthat session
session 2 called via callr::r() to bundle the model
session 3 which is also, separately called via callr::r() to unbundle the model

Our current testing strategy needs us to have the model in both session 1 (to get predictions we know are right, to butcher where appropriate) and session 2 (to be returned as a bundle). These models can't be passed from one session to the other (the whole pain point motivating this package!) and we can't write helper functions to create them (not in the package because of R CMD check and not in a test helper file because callr can't find them).

This feels kind of bad, but session 2 could return a list to session 1 containing both the model bundle and the "regular" predictions? Like session_res <- callr::r(function() {mass of code}) where session_res$mod_bundle is a bundle and session_res$mod_preds are the original predictions?

rstudio / bundle

tempfile persistence #40