Open Guocode opened 5 years ago
Hmm. When I was implementing this, many of the details were not a available in the paper so I had to come up with reasonable defaults. Ofc, those may have been wrong.
You may try using either first or the last state. I chose the first state as the output of the first rnn step is chained to as the next input, so it made logical sense to have state[0] as initial input.
If you try state[-1] and it works, I would be glad to change it.
I must say, the progressive nas version is much more stable than this rl codebase
Thanks very much for your reply and the implement. In fact I'm not so familiar with RNN but I think it's more reasonable to predict the next state by the whole previous state, as if you want to predict a sentence someone will say then the previous sentence should be considered but not the last word in the previous sentence.
I don't really understand by what you mean as the "whole" state. Keras, and an RNN in general, will only accept a state size of shape (batch size, state size) * number of hidden states as input.
If by "whole state" your mean all timesteps of the state (batch size, timesteps, state size) * number of hidden states, it is not possible to do so.
That is why I chain the last timestep of this state vector to the input of the next rnn call. This is standard practice in Machine Translation and stateful RNN prediction.
Let's take a easy example, a current_state is like [(filter)16, (kernel)3, (filter)32, (kernel)(5)],encoded as [0,0,1,1] which may represent a two layers network.Then to predict the next state, in your code RNN will only take the (kernel)(5) to be input, and then predict the encoded next_state like [1,1,0,1] maybe, I understand that RNN will successively output next_state[0] by kernel(5), then output next_state[1] by next_state[0] and so on. What I mean is that can it be like this, RNN takes the current_state[0], drops the output, current_state[1], drop again, current_state[2], drop again, when it comes to current_state [3], the later output is taken to be next_state. Under this assumption RNN has gone through the 'whole' current_state and 'remember' the 'whole' of that so that its prediction is made by the complete knowledge of current one.
Yes that's a way of priming the rnn states for prediction and can be done. I haven't implemented that, but if you so wish, you can send a PR and I'll review it.
why policy network uses state[0] to be input but not the whole state? It is difficulty to understand that policy network predicts the whole network architecture by only the first layer, I think at least it should use state[-1](the last layer of previous state) to predict the first layer of the next state.