In the training phase the _self.initialstate is used as _cell.zerostate and _laststate of the last layer is kept:
self.initial_state = cell.zero_state(args.batch_size, tf.float32)
outputs, last_state = legacy_seq2seq.rnn_decoder(inputs, self.initial_state, cell, loop_function=loop if not training else None, scope='rnnlm')
self.final_state = last_state
However, in the testing phase (def sample()) it seems that all the layers are fed just with the state of the last layer of the previous step, _self.finalstate, as:
If I'm not wrong I think all the states of each layer must be kept and then fed them in their corresponding layer for the following steps, not feeding the last one to all the layers.
In the training phase the _self.initialstate is used as _cell.zerostate and _laststate of the last layer is kept:
However, in the testing phase (def sample()) it seems that all the layers are fed just with the state of the last layer of the previous step, _self.finalstate, as:
If I'm not wrong I think all the states of each layer must be kept and then fed them in their corresponding layer for the following steps, not feeding the last one to all the layers.