yenchenlin / DeepLearningFlappyBird

Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).
MIT License
6.65k stars 2.04k forks source link

Why do the program only use two state? #33

Closed guotong1988 closed 7 years ago

guotong1988 commented 7 years ago

I read from here. Why do the program only use the current state and the next state? Why only using the two state can work? Thank you @yenchenlin

ColdCodeCool commented 7 years ago

@guotong1988 I think you should learn the very basic concept of reinforcement learning. It is basically a dynamic program, the state changes from time to time. You'd better learn Markov Decision Process and Bellman Equation first.

guotong1988 commented 7 years ago

the state changes from time to time thank you could you please have a look at my another question? thx! the question is also in the issues

guotong1988 commented 7 years ago

反过来想,为什么不只用1个state呢,而用了2个state

ColdCodeCool commented 7 years ago

@guotong1988 no, you cannot use only one state, since intuitively you must communicate with the environment by behaving to learn a lesson. Once your action done, you are in another state, and you get reward or punishment from the environment, thus you can learn something.

ColdCodeCool commented 7 years ago

@guotong1988 for comprehensive understanding, you should learn mdp theory first.

guotong1988 commented 7 years ago

关键这两个state是紧挨着的, 就是说第二个state有情况,是前若干步决定的啊

ColdCodeCool commented 7 years ago

@guotong1988 like I said, you really need to learn mdp first. Markov property informs the current state captures all relevant information from the history. Thus the future state only depends on the current state. In mathematical forms, P[s{t+1}|s{t}] = P[s_{t+1}|s_1,...,s_t].

guotong1988 commented 7 years ago

The answer: One state contains 4 frame.