mratsim / Arraymancer

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
https://mratsim.github.io/Arraymancer/
Apache License 2.0
1.34k stars 95 forks source link

Save/load from disk (serializing / marshalling) #163

Open mratsim opened 6 years ago

mratsim commented 6 years ago

Format to be defined:

Non-binary (will certainly have size issues)

Binary

mratsim commented 6 years ago

After further research including Thrift, Avro, Cap'n Proto and FlatBuffers, I've concluded that a binary serializer with schema would be best:

mratsim commented 6 years ago

Suggested by @sherjilozair, .npy file support would be great.

Description , specs and C utilities here:

In the same vein, it will be useful to be able to load common files like:

NESM also should be considered, not sure it can be used for memory-mapping though: https://xomachine.github.io/NESM/

Vindaar commented 6 years ago

I'll try to find some time in the near future to allow loading of .hdf5 files using nimhdf5 (arraymancer is a dependency anyways). Do you know of any examples of neural networks stored in an .hdf5 file, which I can use as reference?

sherjilozair commented 6 years ago

Lots of examples here: https://github.com/fchollet/deep-learning-models/releases/

Vindaar commented 6 years ago

Sweet, thanks!

smartmic commented 4 years ago

This issue is still open and I am wondering… what would be the canonical way to save/load a neural network defined in Arraymancer in early 2020? HDF5, msgpack, … ? Will there be an interface for serialization of NNs for deployments or do I have to define my own structure? An example for noobs like me would be really helpful though, but I will try also myself.

mratsim commented 4 years ago

Currently there is no model-wide saving support. For the future it will probably be HDF5 and/or ONNX/Protobuf.

For individual tensors you can use Numpy or HDF5.

See the tests for usage:

HDF5

https://github.com/mratsim/Arraymancer/blob/407cae439d5f1f76431251c28a7e6fc9652444e3/tests/io/test_hdf5.nim#L46-L105

Numpy

https://github.com/mratsim/Arraymancer/blob/407cae439d5f1f76431251c28a7e6fc9652444e3/tests/io/test_numpy.nim#L17-L85


The best way forward would be to implement a serializer to HDF5 that can (de)serialize any type only made only of tensors, including with nested types support. Then when saving a model we can pass it to that serializer.


I didn't work on the NN part of Arraymancer during 2019 because I'm revamping the low-level routines in Laser and will adopt a compiler approach: https://github.com/numforge/laser/tree/master/laser/lux_compiler. This is to avoid maintaining a code path for CPU, Cuda, OpenCL, Metal, ...

And also I'm working on Nim core multithreading routines to provide a high-level efficient, lightweight and composable foundation to build multithreaded programs like Arraymancer on top as I've had and still have multiple OpenMP issues (see: https://github.com/mratsim/weave).

And as I have a full-time job as well, I'm really short on time to tackle issues that require careful design and usability tradeoffs.

arkocal commented 4 years ago

For practicality, it could make sense to provide this functionality in a limited manner (will be explained below) rather than waiting to come up with a perfect solution that covers all cases, especially considering that is a very central feature and the issue is open since almost three years. I would be interested in helping you with it.

I think it is much more critical to save model parameters than network topology and context information, for example by providing a function that is similar to pytorchs .save for state_dicts. (as described here). The network itself should not be contained in the saved file, it is defined programmatically and is not necessarily part of the data. Losing context information is in general also not a big deal, especially if the trained model is being saved to be run later and not trained further (which I guess is a common use-case). So the limited solution would be just to save this, and when loading, create the model from scratch using the same code and loading the parameters. This can be implemented externally without changing the already functioning parts, or by modifying the network macro to generate save/load functions.

The major problem with this approach is the way the network macro is implemented. As the context is given a variable, the model cannot be simply reused. I cannot really understand the logic behind this, but please let me know if I am missing something. The network macro is creating a type, that could otherwise be re-used, but is reduced to a singleton because of the context. Would not it be better to create the type so that every object creates its own context, or the context is assigned recursively down the tensors with a function? Such a function would be a workaround for the model re-usability problem and also a good basis for an alternative implementation if the described change is desired.

Please let me know if you would be interested in a solution like this, if yes, I would gladly take the issue and provide a more concrete design before moving on with an implementation.

arkocal commented 4 years ago

@mratsim Forgot to mention.

Thanks for the great library by the way.

elcritch commented 4 years ago

Ran into this too, running the ex02_handwritten_digits_recognition.nim example. I use MsgPack a lot, and so tried msgpack4nim. It blew up, but after reading this I dove into a bit more. After a little experimenting it seems there's an easy way to save the model weights by just adding a couple of helper classes to msgpack4nim for storing the different layer types. Really simple actually!

Here's all that's needed for the handwritten digits example. Define helpers for Conv2DLayer, and LinearLyaer for the the msgpack4nim library and you can do the standard load/save from that library. The resulting trained file is ~22 Mbytes. The limitation, as noted above, is that you need the code defining the model for this to work. Still, it's useful.

import arraymancer, streams, msgpack4nim

proc pack_type*[ByteStream](s: ByteStream, layer: Conv2DLayer[Tensor[float32]]) =
  let weight: Tensor[float32] = layer.weight.value
  let bias: Tensor[float32] = layer.bias.value
  s.pack(weight) # let the compiler decide
  s.pack(bias) # let the compiler decide

proc unpack_type*[ByteStream](s: ByteStream, layer: var Conv2DLayer[Tensor[float32]]) =
  s.unpack(layer.weight.value)
  s.unpack(layer.bias.value)

proc pack_type*[ByteStream](s: ByteStream, layer: LinearLayer[Tensor[float32]]) =
  let weight: Tensor[float32] = layer.weight.value
  let bias: Tensor[float32] = layer.bias.value
  s.pack(weight) # let the compiler decide
  s.pack(bias) # let the compiler decide

proc unpack_type*[ByteStream](s: ByteStream, layer: var LinearLayer[Tensor[float32]]) =
  s.unpack(layer.weight.value)
  s.unpack(layer.bias.value)

proc loadData*[T](data: var T, fl: string) =
  var ss = newFileStream(fl, fmRead)
  if not ss.isNil():
    ss.unpack(data) 
    ss.close()
  else:
    raise newException(ValueError, "no such file?")

proc saveData*[T](data: T, fl: string) =
  var ss = newFileStream(fl, fmWrite)
  if not ss.isNil():
    ss.pack(data) 
    ss.close()

Then calling saveData saves the whole model:

var model = ctx.init(DemoNet)
# ... train model ... 
model.saveData("test_model.mpack")
## restart model
model.loadData("test_model.mpack")
## continues at last training accuracy
elcritch commented 4 years ago

A note on the above, MsgPack does pretty well in size compared to pure JSON. The exported msgpack file from above is ~22MB (or 16MB when bzipped), or when converted to JSON it results in an 87MB file (33M when bzipped). Not sure how HDF5 or npy would compare. Probably similar, unless the Tensor type was converted from float32's or some other optimizations occur.

elcritch commented 4 years ago

I'm running into what looks to be incomplete saving of a trained model. Saving a fully trained DemoNet model (with 90+% accuracy) using the previously described msgpack4nim method then reloading the model and running the validation/accuracy testing section results in only about 6% accuracy.

The msgpack4nim library uses the object fields to know what to serialize. Iterating over fieldPairs(model) for the DemoNet (https://github.com/mratsim/Arraymancer/blob/1a2422a1e150a9794bfaa28c62ed73e3c7c41e47/examples/ex02_handwritten_digits_recognition.nim#L36) model only prints out fields for: "hidden", "classifier", "cv1", and "cv2". It's missing "x" (Input), "mp1" (MaxPool2D), "fl" (Flatten).

Originally I though those must not have state and therefore not need to be stored. But now with the serialize/deserialize not working as intended I am not sure. Is there any other state that I would need to ensure is saved to fully serialize a model and de-serialize it? Perhaps the de-serializing isn't re-packing the all the correct fields?

Here are the "custom" type overrides for the serialized layers for reference:

import arraymancer, streams, msgpack4nim

proc pack_type*[ByteStream](s: ByteStream, layer: Conv2DLayer[Tensor[float32]]) =
  let weight: Tensor[float32] = layer.weight.value
  let bias: Tensor[float32] = layer.bias.value
  s.pack(weight) # let the compiler decide
  s.pack(bias) # let the compiler decide

proc unpack_type*[ByteStream](s: ByteStream, layer: var Conv2DLayer[Tensor[float32]]) =
  s.unpack(layer.weight.value)
  s.unpack(layer.bias.value)

proc pack_type*[ByteStream](s: ByteStream, layer: LinearLayer[Tensor[float32]]) =
  let weight: Tensor[float32] = layer.weight.value
  let bias: Tensor[float32] = layer.bias.value
  s.pack(weight) # let the compiler decide
  s.pack(bias) # let the compiler decide

proc unpack_type*[ByteStream](s: ByteStream, layer: var LinearLayer[Tensor[float32]]) =
  s.unpack(layer.weight.value)
  s.unpack(layer.bias.value)
mratsim commented 4 years ago

AFAIK you're doing the correct thing for weights/bias: https://github.com/mratsim/Arraymancer/blob/88edbb6768b7b7ecd2bf20dd19b27f88bd341ea2/src/arraymancer/nn_dsl/dsl_types.nim#L63-L81

For the other I don't store the shape metadata in the layers (they are compile-time transformed away) https://github.com/mratsim/Arraymancer/blob/88edbb6768b7b7ecd2bf20dd19b27f88bd341ea2/src/arraymancer/nn_dsl/dsl_types.nim#L43-L57

but I probably should to ease serialization

elcritch commented 4 years ago

Ok, thanks that's good to know the weights/biases seem correct. There's a good chance I am missing a part of the serialization or messing up the prediction. All of the re-serialized Tensors values appear to be correct.

One last question, is there anything special for the Variable[T] wrappers? Currently I'm instantiating a new instance of the model from the model:

var model = ctx.init(DemoNet)
...
model.loadData(model_file_path)

The loadData will unpack all of the fields (by copying or setting fields I presume), could that be messing up variable contexts somehow? I wouldn't think so, but some of the nuances of Nim regarding references and copies.

elcritch commented 4 years ago

For the other I don't store the shape metadata in the layers (they are compile-time transformed away)

Eventually that would be nice. I currently am just redefining the model code which works for my use case.

mratsim commented 4 years ago

Variable stores the following

https://github.com/mratsim/Arraymancer/blob/1a2422a1e150a9794bfaa28c62ed73e3c7c41e47/src/arraymancer/autograd/autograd_common.nim#L59-L70

The Context starts empty https://github.com/mratsim/Arraymancer/blob/1a2422a1e150a9794bfaa28c62ed73e3c7c41e47/src/arraymancer/autograd/autograd_common.nim#L44-L57

and then as we pass through layers, a record of the layers applied is appended to Context.node. no_grad is a runtime flag to activate/deactivate recording depending on traiing or inference so no need to save that.

The value field is the actual weight and must be saved.

The grad field is not important, it is used for accumulating the gradient of the layer in backpropagation when requires_grad is set to true. Then the optimizer (SGD, Adam) will multiply it by the learning rate (for SGD) or something more advanced (for Adam) and substract the gradient from the value field. It is always zero-ed on training so no need to serialize it: https://github.com/mratsim/Arraymancer/blob/1a2422a1/src/arraymancer/autograd/gates_blas.nim#L18-L58

elcritch commented 4 years ago

Thanks, I tried reading the dsl.nim but got lost on where things were defined.

Based on that those code snippets, the only place that I'm not sure is setup correctly is context, perhaps it's getting set incorrectly. The grad, requires_grad shouldn't be needed. They're probably being overwritten, but it sounds like it shouldn't matter in the ctx.no_grad_mode where I'm doing the prediction.

If I understand it correctly, the nodes isn't used for computation or doing a model.forward right?

elcritch commented 4 years ago

I'm only saving the model after training. Are any of the above stored in the ctx?

edit: Looking through this more, it doesn't appear so. I think the model is being saved/restored correctly. It may be a bug in how I'm ordering my tests when doing predictions. The squeeze/unsqueeze operations didn't work well on my 1 items labels.

disruptek commented 4 years ago

Probably not terribly useful long-term, but for rough purposes you might try https://github.com/disruptek/frosty. It’s kinda designed for “I know what I’m doing” hacks and it could help your differential diagnosis.

I used msgpack4nim but wanted more of a fire-and-forget solution that I could trust.

forest1102 commented 3 years ago

Any Update? How can I save and retrieve a model's weights?

forest1102 commented 3 years ago

Any Update? How can I save and retrieve a model's weights?

Niminem commented 3 years ago

I'm so at a loss for saving and loading models.. respectfully, how are we supposed to use arraymancer for deep learning without being able to do this?

Niminem commented 3 years ago

Things I learned from trying to solve this problem all day, hope it helps someone:

In order to save/load weights and biases of your model, you'll first need to define these manually-

  1. Layer types
  2. Network type
  3. weight and bias initializations
  4. Network init proc
  5. forward proc

working test example: ` type LinearLayer = object weight: Variable[Tensor[float32]] bias: Variable[Tensor[float32]] ExampleNetwork = object hidden: LinearLayer output: LinearLayer

template weightInit(shape: varargs[int], init_kind: untyped): Variable = ctx.variable( init_kind(shape, float32), requires_grad = true)

proc newExampleNetwork(ctx: Context[Tensor[float32]]): ExampleNetwork = result.hidden.weight = weightInit(HIDDEN_D, INPUT_D, kaiming_normal) result.hidden.bias = ctx.variable(zeros[float32](1, HIDDEN_D), requires_grad = true) result.output.weight = weightInit(OUTPUT_D, HIDDEN_D, yann_normal) result.output.bias = ctx.variable(zeros[float32](1, OUTPUT_D), requires_grad = true)

proc forward(network: ExampleNetwork, x: Variable): Variable = result = x.linear( network.hidden.weight, network.hidden.bias).relu.linear( network.output.weight, network.output.bias) `

Then, you'll need to create your save/load procs. I'll save you the headache here as well- use numpy files. Long story short, forget about hdf5.. and the others aren't as efficient.

working test example: `proc save(network: ExampleNetwork) = network.hidden.weight.value.write_npy("hiddenweight.npy") network.hidden.bias.value.write_npy("hiddenbias.npy") network.output.weight.value.write_npy("outputweight.npy") network.output.bias.value.write_npy("outputbias.npy")

proc load(ctx: Context[Tensor[float32]]): ExampleNetwork = result.hidden.weight = ctx.variable(read_npyfloat32, requires_grad = true) result.hidden.bias = ctx.variable(read_npyfloat32, requires_grad = true) result.output.weight = ctx.variable(read_npyfloat32, requires_grad = true) result.output.bias = ctx.variable(read_npyfloat32, requires_grad = true)`

At some point in the future I'll work on getting the network macro to integrate loading and saving models but for now, this POC/example should help push you in the right direction.