rstudio / bundle

Prepare objects for serialization with a consistent interface
https://rstudio.github.io/bundle/
Other
27 stars 4 forks source link

Update keras bundle method #41

Closed juliasilge closed 2 years ago

juliasilge commented 2 years ago

Closes #37

The keras method now writes the model to a directory, tars that directory, serializes the tar file, then untars and loads the model at unbundle time.

simonpcouch commented 2 years ago

I think this may not resolve #37 as-is—it looks to me like this stores the serialized path to the tar file rather than the tar file itself.

x <- rnorm(1000)

file_loc <- fs::file_temp(pattern = "bundle", ext = ".tar.gz")
tmp_dir <- fs::dir_create(tempdir(), "bundle")
x_file_loc <- tempfile(tmpdir = tmp_dir, fileext = ".Rda")
save(x, file = x_file_loc)

withr::with_dir(
  tmp_dir,
  utils::tar(
    tarfile = file_loc,
    compression = "gzip",
    tar = "internal"
  )
)

serialized <- serialize(file_loc, connection = NULL)

unserialize(serialized)
#> /var/folders/6c/w21prsj167b_x82q4_s45t340000gn/T/RtmpStMLCa/bundle103e4276eef15.tar.gz

Created on 2022-08-24 by the reprex package (v2.0.1)

I may be missing something here—does this reprex misrepresent the proposed changes?

juliasilge commented 2 years ago

OH MY GOSH still only serializing the path 😭

I looked at it a bit more and I think this is what we want?

tmp_dir <- fs::dir_create(tempdir(), "bundle")
file_loc <- fs::file_temp(pattern = "bundle", ext = ".tar.gz")
keras::save_model_tf(mod, tmp_dir)

withr::with_dir(
  tmp_dir,
  utils::tar(
    tarfile = file_loc,
    compression = "gzip",
    tar = "internal"
  )
)

raw <- readBin(file_loc, what = "raw", file.size(file_loc), endian = "little")

new_file <- fs::file_temp(pattern = "bundle", ext = ".tar.gz")
unbundle_dir <- fs::dir_create(tempdir(), "unbundle")
writeBin(raw, new_file, endian = "little")
utils::untar(new_file, exdir = unbundle_dir)

load_model_tf(unbundle_dir)
juliasilge commented 2 years ago

The withr use is needed, I believe, because tar doesn't play nicely if it's not in the working directory; the alternative is using on.exit and changing the directory manually, etc. IMO using withr is both more convenient and safer in the long run. I think we also will be able to use withr for #40 to make sure we are in a new working dir when unbundling.

The fs use is more a "nice to have" but it is in fact much nicer in my experience. We may need it more in the future (better for anything needing directories, I think) but let's move it to Suggests too.

juliasilge commented 2 years ago

Thank you so much @simonpcouch! 🙌