question on freezing target nework

yenchenlin / DeepLearningFlappyBird

Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).

MIT License

6.65k stars 2.04k forks source link

question on freezing target nework #15

Open hashbangCoder opened 8 years ago

hashbangCoder commented 8 years ago

Hi @yenchenlin1994 , love your implementation! I went through your code and I can't seem to find where you've frozen the target network? Unless Im missing something in my excess-caffeine induced brain fade,you continue to update the target every batch? Wouldn't that hurt your convergence rate badly?

yenchenlin commented 8 years ago

Hello, Yeah you are right. Actually I got a reimplemented version. Will submit soon! On Wed, Apr 20, 2016 at 17:46 Code-Deep-Blue notifications@github.com wrote:

Hi @yenchenlin1994 https://github.com/yenchenlin1994 , love your implementation! I went through your code and I can't seem to find where you've frozen the target network? Unless Im missing something in my excess-caffeine induced brain fade,you continue to update the target every batch? Wouldn't that hurt your convergence rate badly?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/yenchenlin1994/DeepLearningFlappyBird/issues/15

hashbangCoder commented 8 years ago

Hi again, i'm trying to reproduce the results on keras and have trained for ~400,000 steps and the bird is unable to cross the first pipe consistently. My loss is low though (~ 0.2) and Q-values are in the range of [0,8]. How long did it take for you before it actually started working i.e. cross the first pipe consistently?

yenchenlin commented 8 years ago

I can't remember the exactly number of iterations, but it's no more than ~1000,000 steps

xiahouzuoxin commented 7 years ago

Still cannot find freezing target network in current version's code. It's really no effect?

zsy372901 commented 7 years ago

@hashbangCoder I meet the same question that the silly bird keeps top of the screen.....Did you fix it?

weijinsong commented 6 years ago

I also couldn't find freezing target network code. But thanks for your code. It's helpful for me.

initial-h commented 6 years ago

I write a version base on this repo with freezing target network.FlappyBird_DQN_with_target_network

patrick-llgc commented 5 years ago

Here is another repo with target network. https://github.com/patrick-12sigma/DRL_FlappyBird

I made target network an option. You can turn it on and off and experiment to see how much it affects the convergence of training.

I refactored the network into a class, and added some logging functionalities to track the training process. I also borrowed the human play function from @initial-h. Thanks!