Why the value function seems different from that in the paper?

mlii / mfrl

Mean Field Multi-Agent Reinforcement Learning

MIT License

369 stars 101 forks source link

Why the value function seems different from that in the paper? #1

Closed woaipichuli closed 6 years ago

woaipichuli commented 6 years ago

I found the value function in the code is a Q table and is defined as (Q = np.zeros((n_agents, dim_Q_state, n_actions))). However, the Q function in the paper is defined as (state,action1,action2). Why they are different?

mlii commented 6 years ago

@woaipichuli Hi, this repo currently contains the code for Ising model mentioned in Section 5.2, details can be found in Appendix C.2. Since the mean action here is the number of neighboring sites aligned in the same direction, we use a table to store the (action, action_mean) for each site.

lyers179 commented 6 years ago

Is the mean action a probability distribution?

KornbergFresnel commented 6 years ago

@lyers179 yes

lyers179 commented 6 years ago

if the mean action is a probability distribution,how do you store the matrix Qtable Q(s,action,mean action)?because the index of Qtable(mean action)is a float not integer.

KornbergFresnel commented 6 years ago

@lyers179 may you have some misunderstandings with deep reinforcement learning and MDPs. In stead of storing Q-table, we generally store many tuples (or transitions) like: <S_t, a_t, r_t, S_{t+1}>.

mlii commented 6 years ago

@lyers179 Hi, in the case of MFQ for Ising model, the mean action is the number of neighboring sites aligned in the same direction, which is a discrete distribution. Thus we can use a table to store the (action, action_mean) for each site. For other applications, the definition of mean action could be different.

lyers179 commented 6 years ago

Thank you so much,i understand.