Open mratsim opened 6 years ago
After further research including Thrift, Avro, Cap'n Proto and FlatBuffers, I've concluded that a binary serializer with schema would be best:
Suggested by @sherjilozair, .npy file support would be great.
Description , specs and C utilities here:
In the same vein, it will be useful to be able to load common files like:
NESM also should be considered, not sure it can be used for memory-mapping though: https://xomachine.github.io/NESM/
I'll try to find some time in the near future to allow loading of .hdf5 files using nimhdf5 (arraymancer is a dependency anyways). Do you know of any examples of neural networks stored in an .hdf5 file, which I can use as reference?
Lots of examples here: https://github.com/fchollet/deep-learning-models/releases/
Sweet, thanks!
This issue is still open and I am wondering… what would be the canonical way to save/load a neural network defined in Arraymancer in early 2020? HDF5, msgpack, … ? Will there be an interface for serialization of NNs for deployments or do I have to define my own structure? An example for noobs like me would be really helpful though, but I will try also myself.
Currently there is no model-wide saving support. For the future it will probably be HDF5 and/or ONNX/Protobuf.
For individual tensors you can use Numpy or HDF5.
See the tests for usage:
The best way forward would be to implement a serializer to HDF5 that can (de)serialize any type only made only of tensors, including with nested types support. Then when saving a model we can pass it to that serializer.
I didn't work on the NN part of Arraymancer during 2019 because I'm revamping the low-level routines in Laser and will adopt a compiler approach: https://github.com/numforge/laser/tree/master/laser/lux_compiler. This is to avoid maintaining a code path for CPU, Cuda, OpenCL, Metal, ...
And also I'm working on Nim core multithreading routines to provide a high-level efficient, lightweight and composable foundation to build multithreaded programs like Arraymancer on top as I've had and still have multiple OpenMP issues (see: https://github.com/mratsim/weave).
And as I have a full-time job as well, I'm really short on time to tackle issues that require careful design and usability tradeoffs.
For practicality, it could make sense to provide this functionality in a limited manner (will be explained below) rather than waiting to come up with a perfect solution that covers all cases, especially considering that is a very central feature and the issue is open since almost three years. I would be interested in helping you with it.
I think it is much more critical to save model parameters than network topology and context information, for example by providing a function that is similar to pytorchs .save
for state_dict
s. (as described here). The network itself should not be contained in the saved file, it is defined programmatically and is not necessarily part of the data. Losing context information is in general also not a big deal, especially if the trained model is being saved to be run later and not trained further (which I guess is a common use-case). So the limited solution would be just to save this, and when loading, create the model from scratch using the same code and loading the parameters. This can be implemented externally without changing the already functioning parts, or by modifying the network
macro to generate save
/load
functions.
The major problem with this approach is the way the network
macro is implemented. As the context is given a variable, the model cannot be simply reused. I cannot really understand the logic behind this, but please let me know if I am missing something. The network
macro is creating a type, that could otherwise be re-used, but is reduced to a singleton because of the context. Would not it be better to create the type so that every object creates its own context, or the context is assigned recursively down the tensors with a function? Such a function would be a workaround for the model re-usability problem and also a good basis for an alternative implementation if the described change is desired.
Please let me know if you would be interested in a solution like this, if yes, I would gladly take the issue and provide a more concrete design before moving on with an implementation.
@mratsim Forgot to mention.
Thanks for the great library by the way.
Ran into this too, running the ex02_handwritten_digits_recognition.nim
example. I use MsgPack a lot, and so tried msgpack4nim
. It blew up, but after reading this I dove into a bit more. After a little experimenting it seems there's an easy way to save the model weights by just adding a couple of helper classes to msgpack4nim
for storing the different layer types. Really simple actually!
Here's all that's needed for the handwritten digits example. Define helpers for Conv2DLayer, and LinearLyaer for the the msgpack4nim
library and you can do the standard load/save from that library. The resulting trained file is ~22 Mbytes. The limitation, as noted above, is that you need the code defining the model for this to work. Still, it's useful.
import arraymancer, streams, msgpack4nim
proc pack_type*[ByteStream](s: ByteStream, layer: Conv2DLayer[Tensor[float32]]) =
let weight: Tensor[float32] = layer.weight.value
let bias: Tensor[float32] = layer.bias.value
s.pack(weight) # let the compiler decide
s.pack(bias) # let the compiler decide
proc unpack_type*[ByteStream](s: ByteStream, layer: var Conv2DLayer[Tensor[float32]]) =
s.unpack(layer.weight.value)
s.unpack(layer.bias.value)
proc pack_type*[ByteStream](s: ByteStream, layer: LinearLayer[Tensor[float32]]) =
let weight: Tensor[float32] = layer.weight.value
let bias: Tensor[float32] = layer.bias.value
s.pack(weight) # let the compiler decide
s.pack(bias) # let the compiler decide
proc unpack_type*[ByteStream](s: ByteStream, layer: var LinearLayer[Tensor[float32]]) =
s.unpack(layer.weight.value)
s.unpack(layer.bias.value)
proc loadData*[T](data: var T, fl: string) =
var ss = newFileStream(fl, fmRead)
if not ss.isNil():
ss.unpack(data)
ss.close()
else:
raise newException(ValueError, "no such file?")
proc saveData*[T](data: T, fl: string) =
var ss = newFileStream(fl, fmWrite)
if not ss.isNil():
ss.pack(data)
ss.close()
Then calling saveData
saves the whole model:
var model = ctx.init(DemoNet)
# ... train model ...
model.saveData("test_model.mpack")
## restart model
model.loadData("test_model.mpack")
## continues at last training accuracy
A note on the above, MsgPack does pretty well in size compared to pure JSON. The exported msgpack file from above is ~22MB (or 16MB when bzipped), or when converted to JSON it results in an 87MB file (33M when bzipped). Not sure how HDF5 or npy would compare. Probably similar, unless the Tensor type was converted from float32's or some other optimizations occur.
I'm running into what looks to be incomplete saving of a trained model. Saving a fully trained DemoNet
model (with 90+% accuracy) using the previously described msgpack4nim
method then reloading the model and running the validation/accuracy testing section results in only about 6% accuracy.
The msgpack4nim
library uses the object fields to know what to serialize. Iterating over fieldPairs(model)
for the DemoNet
(https://github.com/mratsim/Arraymancer/blob/1a2422a1e150a9794bfaa28c62ed73e3c7c41e47/examples/ex02_handwritten_digits_recognition.nim#L36) model only prints out fields for: "hidden", "classifier", "cv1", and "cv2". It's missing "x" (Input), "mp1" (MaxPool2D), "fl" (Flatten).
Originally I though those must not have state and therefore not need to be stored. But now with the serialize/deserialize not working as intended I am not sure. Is there any other state that I would need to ensure is saved to fully serialize a model and de-serialize it? Perhaps the de-serializing isn't re-packing the all the correct fields?
Here are the "custom" type overrides for the serialized layers for reference:
import arraymancer, streams, msgpack4nim
proc pack_type*[ByteStream](s: ByteStream, layer: Conv2DLayer[Tensor[float32]]) =
let weight: Tensor[float32] = layer.weight.value
let bias: Tensor[float32] = layer.bias.value
s.pack(weight) # let the compiler decide
s.pack(bias) # let the compiler decide
proc unpack_type*[ByteStream](s: ByteStream, layer: var Conv2DLayer[Tensor[float32]]) =
s.unpack(layer.weight.value)
s.unpack(layer.bias.value)
proc pack_type*[ByteStream](s: ByteStream, layer: LinearLayer[Tensor[float32]]) =
let weight: Tensor[float32] = layer.weight.value
let bias: Tensor[float32] = layer.bias.value
s.pack(weight) # let the compiler decide
s.pack(bias) # let the compiler decide
proc unpack_type*[ByteStream](s: ByteStream, layer: var LinearLayer[Tensor[float32]]) =
s.unpack(layer.weight.value)
s.unpack(layer.bias.value)
AFAIK you're doing the correct thing for weights/bias: https://github.com/mratsim/Arraymancer/blob/88edbb6768b7b7ecd2bf20dd19b27f88bd341ea2/src/arraymancer/nn_dsl/dsl_types.nim#L63-L81
For the other I don't store the shape metadata in the layers (they are compile-time transformed away) https://github.com/mratsim/Arraymancer/blob/88edbb6768b7b7ecd2bf20dd19b27f88bd341ea2/src/arraymancer/nn_dsl/dsl_types.nim#L43-L57
but I probably should to ease serialization
Ok, thanks that's good to know the weights/biases seem correct. There's a good chance I am missing a part of the serialization or messing up the prediction. All of the re-serialized Tensors values appear to be correct.
One last question, is there anything special for the Variable[T]
wrappers? Currently I'm instantiating a new instance of the model from the model:
var model = ctx.init(DemoNet)
...
model.loadData(model_file_path)
The loadData
will unpack all of the fields (by copying or setting fields I presume), could that be messing up variable contexts somehow? I wouldn't think so, but some of the nuances of Nim regarding references and copies.
For the other I don't store the shape metadata in the layers (they are compile-time transformed away)
Eventually that would be nice. I currently am just redefining the model code which works for my use case.
Variable
stores the following
The Context
starts empty
https://github.com/mratsim/Arraymancer/blob/1a2422a1e150a9794bfaa28c62ed73e3c7c41e47/src/arraymancer/autograd/autograd_common.nim#L44-L57
and then as we pass through layers, a record of the layers applied is appended to Context.node
. no_grad
is a runtime flag to activate/deactivate recording depending on traiing or inference so no need to save that.
The value
field is the actual weight and must be saved.
The grad
field is not important, it is used for accumulating the gradient of the layer in backpropagation when requires_grad
is set to true. Then the optimizer (SGD, Adam) will multiply it by the learning rate (for SGD) or something more advanced (for Adam) and substract the gradient from the value
field.
It is always zero-ed on training so no need to serialize it: https://github.com/mratsim/Arraymancer/blob/1a2422a1/src/arraymancer/autograd/gates_blas.nim#L18-L58
Thanks, I tried reading the dsl.nim but got lost on where things were defined.
Based on that those code snippets, the only place that I'm not sure is setup correctly is context
, perhaps it's getting set incorrectly. The grad, requires_grad shouldn't be needed. They're probably being overwritten, but it sounds like it shouldn't matter in the ctx.no_grad_mode
where I'm doing the prediction.
If I understand it correctly, the nodes
isn't used for computation or doing a model.forward
right?
I'm only saving the model
after training. Are any of the above stored in the ctx
?
edit: Looking through this more, it doesn't appear so. I think the model is being saved/restored correctly. It may be a bug in how I'm ordering my tests when doing predictions. The squeeze
/unsqueeze
operations didn't work well on my 1 items labels.
Probably not terribly useful long-term, but for rough purposes you might try https://github.com/disruptek/frosty. It’s kinda designed for “I know what I’m doing” hacks and it could help your differential diagnosis.
I used msgpack4nim but wanted more of a fire-and-forget solution that I could trust.
Any Update? How can I save and retrieve a model's weights?
Any Update? How can I save and retrieve a model's weights?
I'm so at a loss for saving and loading models.. respectfully, how are we supposed to use arraymancer for deep learning without being able to do this?
Things I learned from trying to solve this problem all day, hope it helps someone:
In order to save/load weights and biases of your model, you'll first need to define these manually-
working test example: ` type LinearLayer = object weight: Variable[Tensor[float32]] bias: Variable[Tensor[float32]] ExampleNetwork = object hidden: LinearLayer output: LinearLayer
template weightInit(shape: varargs[int], init_kind: untyped): Variable = ctx.variable( init_kind(shape, float32), requires_grad = true)
proc newExampleNetwork(ctx: Context[Tensor[float32]]): ExampleNetwork = result.hidden.weight = weightInit(HIDDEN_D, INPUT_D, kaiming_normal) result.hidden.bias = ctx.variable(zeros[float32](1, HIDDEN_D), requires_grad = true) result.output.weight = weightInit(OUTPUT_D, HIDDEN_D, yann_normal) result.output.bias = ctx.variable(zeros[float32](1, OUTPUT_D), requires_grad = true)
proc forward(network: ExampleNetwork, x: Variable): Variable = result = x.linear( network.hidden.weight, network.hidden.bias).relu.linear( network.output.weight, network.output.bias) `
Then, you'll need to create your save/load procs. I'll save you the headache here as well- use numpy files. Long story short, forget about hdf5.. and the others aren't as efficient.
working test example: `proc save(network: ExampleNetwork) = network.hidden.weight.value.write_npy("hiddenweight.npy") network.hidden.bias.value.write_npy("hiddenbias.npy") network.output.weight.value.write_npy("outputweight.npy") network.output.bias.value.write_npy("outputbias.npy")
proc load(ctx: Context[Tensor[float32]]): ExampleNetwork = result.hidden.weight = ctx.variable(read_npyfloat32, requires_grad = true) result.hidden.bias = ctx.variable(read_npyfloat32, requires_grad = true) result.output.weight = ctx.variable(read_npyfloat32, requires_grad = true) result.output.bias = ctx.variable(read_npyfloat32, requires_grad = true)`
At some point in the future I'll work on getting the network
macro to integrate loading and saving models but for now, this POC/example should help push you in the right direction.
Format to be defined:
Non-binary (will certainly have size issues)
Binary