yenchenlin / DeepLearningFlappyBird

Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).
MIT License
6.62k stars 2.04k forks source link

The final loss gradient is 1D but network output is (1,2). How is the gradient propagated ? #70

Open prateethvnayak opened 4 years ago

prateethvnayak commented 4 years ago

I was wondering if the tf.reduce_sum and y are 1d and the mse cost term is 1d, however the gradient to be propagated needs to same dimension as network output i.e (1,ACTIONS) = (1,2). Is the final loss grad just replicated in both dimension ? i.e (1,1) -> (1,2) ?