Don't load solver state

msimberg commented 9 years ago

Forgive me if I've missed this, but Snapshot had an also_load_solver_state-flag which could be set to false so that only network weights are loaded. With the newest version it seems like it's only possible to load both weights and the solver state with the load_from parameter in SolverParameters. Is there still/will there be a way to ignore the solver state when loading a snapshot?

pluskid commented 9 years ago

@msimberg This could be easily added. But I removed that because I think that is not very reasonable. I will provide a small utility to convert JLD snapshot into HDF5 model file. And since HDF5 model file does not contain solver state, loading from it will essentially ignore the solver state.

That being said, I might be totally ignorant about the use case of not-also_load_solver_state. Could you tell me your scenario why you need this? Maybe you can persuade me to add this option back or I could give you an alternative solution?

msimberg commented 9 years ago

You may be right. What I'm doing is trying to predict good moves for Go. I'm doing this by first evaluating what the good moves are with MCTS and then trying to learn the good moves with a convolutional network. But then I can use the outputs of the network as priors for MCTS to refine the evaluations and then repeat the process. Now I have no idea if this will work well or work at all, but if I'm to continously repeat this it would be nice to be able to just tell the solver to always just do some fixed number of iterations and only load the weights. The idea is also that I should be able to load the network again on the next run of the program.

Now that I put my problem down in writing I think I probably do not need this :) I should just be explicitly saving only the model and then loading that, instead of relying on the snapshots. This would actually do exactly what I want (and more explicitly). So I think this can be closed and sorry for the noise.

One related question though (I can test this later today, so feel free to ignore): I'm using a training and a prediction network which share most of their layers. Will the layers actually still be shared if I save both models and then load them again? Or how is this handled? The names etc. will still be the same so I suppose Mocha will either complain or just accept it and share layers again.

pluskid commented 9 years ago

@msimberg I think your scenario makes perfect sense. However, when doing incremental training, I think using the snapshot is still recommended: although the solver state is loaded, you could increase max_iter and re-run the script. It will pick-up the last max-iter and run subsequent iterations. This essentially achieve what you want with two differences:

Now you could track how many iterations you actually did
If the learning rate is set to be decaying with iterations, it will decay continuously. If you ignore the solver state and re-start from iteration 0, it will be a different behavior.

The network weights are shared if you construct the two network with the same layer objects (those InnerProductLayer, ConvolutionLayer, etc.). That being said, explicitly, there is a property called param_key for the layers with train-able weights. By specifying the same param_key, you could make layers share parameters. So what is important is not how you load them separately, but how you construct them. Note in Mocha, the network structure is not stored in snapshot, in order to load a network, you still use the same Julia code to construct the network topology, and then you can load the model parameters. So when you construct the network topology, you already specified whether the parameters are shared in two networks. A little bit more details could be found here: http://mochajl.readthedocs.org/en/latest/user-guide/network.html#shared-parameters

msimberg commented 9 years ago

All right, thanks. Those are good points. I suppose I would have to either manually increase max_iter or somehow separately keep track of how many iterations it has done so far?

Re sharing: sounds good. I'm constructing the networks with the same layers so then when I load a snapshot for the training network, the weights get updated/loaded for the prediction network as well.

pluskid / Mocha.jl

Don't load solver state #16