memory for state - Githubissues

keon / deep-q-learning

Minimal Deep Q Learning (DQN & DDQN) implementations in Keras

https://keon.io/deep-q-learning

MIT License

1.28k stars 454 forks source link

memory for state #14

Open fi000 opened 6 years ago

fi000 commented 6 years ago

thanks Keon for your great code! I have two questions: 1- What does [0] means in self.model.predict(next_state)[0] and return np.argmin(act_values[0])? Does this mean that first element of batch? 2-If in addition to batch, I need that my state is the state from K times before, what is the necessary change in order to do this? I want to send the state=state[i-k+1]....state[i-1],state[i] not only one state! How I can do this?

Thanks again

CarterEllsworth commented 6 years ago

Keras Model API tell us that model.predict(predict_batch) returns a numpy array with an array of predictions for each element in predict_batch. Since model.predict is being called on a single array, next_state then we want the first and only element in the prediction array. Hence '[0]'

fi000 commented 6 years ago

Thanks for question 1- I did not understand how to do it? How to have a memory on state? in nature paper the memory is used and it is equal to 4! and this is in addition to using batch

pskrunner14 commented 6 years ago

@fi000 can you provide a link to the said paper.

self.model.predict(next_state)[0] predicts on batch as @CarterEllsworth pointed out. It returns an array of predictions for each of the elements in the batch but since we're only predicting on one state element we only the first and only prediction, hence the [0].
You could somehow normalize over the last k states as I've implemented here. You'll need to adjust the dimensions according to whatever best suits your task.

fi000 commented 6 years ago

@pskrunner14 You can refer to paper "Playing Atari with Deep Reinforcement Learning" section 4.1 last sentences of first paragraph! I have an state with 5 inputs but I have a problem in giving for instance 4 states in a frame! How we could do this in this code?