Closed mattwarkentin closed 4 years ago
Definitely a great idea and something I would love to support. Reading and writing objects to storage does get us most of the way there. However, there is one more requirement: an injective transformation to serialize exportable objects in memory. keras
does this with serialize_model()
and unserialize_model()
, and it ensures we can send data to and from distributed/parallel workers over a network.
Predictably, I cannot simply serialize torch
objects to raw vectors in base R, and I fear torch::as_array()
is not injective. (We lose information when we try to transform back.)
library(torch)
x <- array(runif(8), dim = c(2, 2, 2))
original <- torch_tensor(x, dtype = torch_float64())
# Try to serialize to raw.
tmp <- tempfile()
torch_save(original, tmp)
raw <- readBin(tmp, what = "raw")
print(raw)
#> [1] 1f
# Try to unserialize from raw.
tmp <- tempfile()
writeBin(raw, tmp)
out <- torch_load(tmp)
#> Error in readRDS(path): error reading from connection
Created on 2020-09-30 by the reprex package (v0.3.0)
So I will definitely keep an eye on this but close until we have https://github.com/mlverse/torch/issues/270 or a workaround.
Hmm, interesting. Well, I'm glad I could put this on your radar and hopefully in the future there will be a solution on the torch
side of things that will allow targets
to offer support for this feature.
Wait a minute: I'm totally wrong about https://github.com/wlandau/targets/issues/179#issuecomment-701703681. I forgot that readBin()
weirdly defaults to 1 instead of the size of the file. Reopening.
library(torch)
x <- array(runif(8), dim = c(2, 2, 2))
original <- torch_tensor(x, dtype = torch_float64())
# Serialize to raw.
tmp <- tempfile()
torch_save(original, tmp)
raw <- readBin(tmp, what = "raw", n = file.size(tmp))
# Unserialize from raw.
tmp <- tempfile()
writeBin(raw, tmp)
torch_load(tmp)
#> torch_tensor
#> (1,.,.) =
#> 0.8850 0.2682
#> 0.9796 0.9439
#>
#> (2,.,.) =
#> 0.2353 0.4076
#> 0.1360 0.8992
#> [ CPUDoubleType{2,2,2} ]
Created on 2020-09-30 by the reprex package (v0.3.0)
For reference, you can skip the 'save to temp file step' with something like:
library(torch)
x <- array(runif(8), dim = c(2, 2, 2))
original <- torch_tensor(x, dtype = torch_float64())
con <- rawConnection(raw(), open = "wr")
torch_save(original, con)
r <- rawConnectionValue(con)
torch_load(rawConnection(r, open = "r"))
Even better, thanks!
Implemented 2 new formats in https://github.com/wlandau/targets/commit/1d021cdb90cacb47dc0fe452b0e3e77d69f6adae and https://github.com/wlandau/targets/commit/549b21e1bb3c4d049f6a47b15fefc06c93b4163a:
"torch"
: local storage in _targets/objects/
"aws_torch"
: cloud storage to Amazon S3Loving how I can implement and test without a Python env!
Amazing! Glad it all worked out.
Loving how I can implement and test without a Python env!
Haha, agreed. This is exactly why I think torch
is going to gain favour in the R
community compared to tensorflow
/keras
implementations.
Prework
Proposal
Hi @wlandau,
With the recent release of
{torch}
forR
(see here) I thought it might be useful to offer a special storage format fortorch
models/objects similar to what is offered forkeras
. I think the release oftorch
will see it used quite a bit for deep learning moving forward as it provides a native binding to thelibtorch
C++ library without requiring any python wrappers - so I think offering support for this enhancestargets
.Since
torch
objects are really just pointers to C++ objects in memory, they can't be serialized normally since you basically just serialize a pointer and the actual object probably won't persist. Existing serialization formats for standardR
objects intargets
, such assaveRDS()
andqs::qsave()
, won't work.torch::torch_save()
andtorch::torch_load()
seem to do the work of properly serializing and loadingtorch
objects and would probably give you what you need to offer atorch
format.