titu1994 / neural-architecture-search

Basic implementation of [Neural Architecture Search with Reinforcement Learning](https://arxiv.org/abs/1611.01578).
MIT License
431 stars 112 forks source link

about RNN prediction #12

Open Guocode opened 5 years ago

Guocode commented 5 years ago

why policy network uses state[0] to be input but not the whole state? It is difficulty to understand that policy network predicts the whole network architecture by only the first layer, I think at least it should use state[-1](the last layer of previous state) to predict the first layer of the next state.

titu1994 commented 5 years ago

Hmm. When I was implementing this, many of the details were not a available in the paper so I had to come up with reasonable defaults. Ofc, those may have been wrong.

You may try using either first or the last state. I chose the first state as the output of the first rnn step is chained to as the next input, so it made logical sense to have state[0] as initial input.

If you try state[-1] and it works, I would be glad to change it.

I must say, the progressive nas version is much more stable than this rl codebase

Guocode commented 5 years ago

Thanks very much for your reply and the implement. In fact I'm not so familiar with RNN but I think it's more reasonable to predict the next state by the whole previous state, as if you want to predict a sentence someone will say then the previous sentence should be considered but not the last word in the previous sentence.

titu1994 commented 5 years ago

I don't really understand by what you mean as the "whole" state. Keras, and an RNN in general, will only accept a state size of shape (batch size, state size) * number of hidden states as input.

If by "whole state" your mean all timesteps of the state (batch size, timesteps, state size) * number of hidden states, it is not possible to do so.

That is why I chain the last timestep of this state vector to the input of the next rnn call. This is standard practice in Machine Translation and stateful RNN prediction.

Guocode commented 5 years ago

Let's take a easy example, a current_state is like [(filter)16, (kernel)3, (filter)32, (kernel)(5)],encoded as [0,0,1,1] which may represent a two layers network.Then to predict the next state, in your code RNN will only take the (kernel)(5) to be input, and then predict the encoded next_state like [1,1,0,1] maybe, I understand that RNN will successively output next_state[0] by kernel(5), then output next_state[1] by next_state[0] and so on. What I mean is that can it be like this, RNN takes the current_state[0], drops the output, current_state[1], drop again, current_state[2], drop again, when it comes to current_state [3], the later output is taken to be next_state. Under this assumption RNN has gone through the 'whole' current_state and 'remember' the 'whole' of that so that its prediction is made by the complete knowledge of current one.

titu1994 commented 5 years ago

Yes that's a way of priming the rnn states for prediction and can be done. I haven't implemented that, but if you so wish, you can send a PR and I'll review it.