Closed ssamot closed 9 years ago
Persist states in RNN is not a good idea in batch training, as you have to reset some states on conditions.
But anyway it's easy to do in deepy, You can create a new layer or just reuse RNN in deepy.
Here is an example:
from deepy import *
class MyRNNWithPersistHiddenValue(NeuralLayer):
def __init__(self):
super(MyRNNWithPersistHiddenValue, self).__init__("MyRNNWithPersistHiddenValue")
def setup(self):
self.rnn = RNN(hidden_size=100, input_type="sequence", output_type="sequence").connect(input_dim=self.input_dim)
self.output_dim = self.rnn.output_dim
self.register_inner_layers(self.rnn)
self.h0 = self.create_vector(100, name="hidden cell")
# To manipulate h0, you can create updates like this
self.register_updates((self.h0, (self.h0 >= 0) * self.h0))
# Or create callback like this, but it's slower
self.register_training_callbacks(self.iter_callback)
self.register_testing_callbacks(self.iter_callback)
def iter_callback(self):
if "something":
self.h0.set_value(self.h0.get_value(borrow=True) * 0)
def output(self, x):
# Swap dims of x
# batch, time, data ---> time, batch, data
seq = x.dimshuffle((1,0,2))
# The step function of RNN receives (sequence, hidden) as input,
# and output processed hidden vars
hiddens, _ = theano.scan(self.rnn.step,
sequences=[seq],
outputs_info=[self.h0])
# Restore dimensions
return hiddens.dimshuffle((1,0,2))
But unless you update self.h0
with the hiddens
it will always keep the initial values right? Where is this done?
Well, in Theano, to implement a language model in the fashion of RNNLM actually is not an easy work. I will dig into this issue in these days, and hopefully to provide an example of RNN-based langauge model training.
Thanks! I think state is needed for a lot of reasons - it will make life easier in online learning as well. Also might prove useful if one is to train sequences of uneven length without padding.
Well, although some code refactoring should be done, you can see https://github.com/uaca/deepy/blob/master/experiments/lm/baseline_rnnlm.py for an example implementation of vanilla language model
Thanks for this - will look into it
Hello - is there a way to keep the state (the activations) of the RNN without resetting it so it can be used in an online mode (i.e, train whenever a new example is given, while keeping all the past information) incrementally?