yenchenlin / DeepLearningFlappyBird

Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).
MIT License
6.65k stars 2.04k forks source link

How do 1 and -1 reward be used? #32

Closed guotong1988 closed 7 years ago

guotong1988 commented 7 years ago

I find from here that all the rewards are add into the deque. We need to sample the 1 and -1 reward from the deque to use them. So do you think it may be slow.

In Chinese:是不是reward为1和-1的情况也都放在deque里,那么reward为1和-1的被sample出来的几率岂不是很低,反馈就会很慢?

Thank you @yenchenlin

ColdCodeCool commented 7 years ago

@guotong1988 I do not quite understand your question. But every transition is recorded as a 4 element tuple in the deque. You should note that the reward is a instant reflection of the environment. I really think you should learn the very basic concept of Markov Reward Process and Markov Decision Process first before reinforcement learning.

guotong1988 commented 7 years ago

看这里,reward为1和-1都只能被sample出来然后feed进去,这样只有被sample出来才能起作用

ColdCodeCool commented 7 years ago

@guotong1988 i see, you are right, thus the author implied that prioritized experience replay can potentially optimize your problem by improving the probability of a more significant transition being chosen in another issue.

guotong1988 commented 7 years ago

Oh,it's here,Thank you !

ColdCodeCool commented 7 years ago

@guotong1988 You are welcome,:)