Purpose of Passing New Frame into State Memory with Previous Action

Hi Phil, huge fan of your work.. I have two questionsn regarding policy gradients TensorFlow for SpaceInvaders:

1.In the reinforce_cnn_tf.py and in the choose_action function there is a line:

probabilities = self.sess.run(self.actions, feed_dict={self.input: observation})[0]

Here 0 specifies that the action probability distribution is the first of the 4 probability distributions, if this is the case then your actions are taken based on the first frame or the 0th observation of the stacked_frames. Is that right?

Assuming my first assumption is right. There is a line in the main_tf_reinforce_space_invaders.py file:

observation, reward, done, info = env.step(action) observation = preprocess(observation) stacked_frames = stack_frames(stacked_frames, observation, stack_size) agent.store_transition(observation, action, reward) (this one)

Here the new observation is getting stored with action taken based on the 0th observation in the stacked_frame, If this is the case why does this work while training the agent? Are the probability distributions when the observations are fed in different from the labels?

philtabor / Youtube-Code-Repository

Purpose of Passing New Frame into State Memory with Previous Action #12