Closed woaipichuli closed 6 years ago
@woaipichuli Hi, this repo currently contains the code for Ising model mentioned in Section 5.2, details can be found in Appendix C.2. Since the mean action here is the number of neighboring sites aligned in the same direction, we use a table to store the (action, action_mean) for each site.
Is the mean action a probability distribution?
@lyers179 yes
if the mean action is a probability distribution,how do you store the matrix Qtable Q(s,action,mean action)?because the index of Qtable(mean action)is a float not integer.
@lyers179 may you have some misunderstandings with deep reinforcement learning and MDPs. In stead of storing Q-table, we generally store many tuples (or transitions) like: <S_t, a_t, r_t, S_{t+1}>
.
@lyers179 Hi, in the case of MFQ for Ising model, the mean action is the number of neighboring sites aligned in the same direction, which is a discrete distribution. Thus we can use a table to store the (action, action_mean) for each site. For other applications, the definition of mean action could be different.
Thank you so much,i understand.
I found the value function in the code is a Q table and is defined as (Q = np.zeros((n_agents, dim_Q_state, n_actions))). However, the Q function in the paper is defined as (state,action1,action2). Why they are different?