mlii / mfrl

Mean Field Multi-Agent Reinforcement Learning
MIT License
369 stars 101 forks source link

How to calculate MF-Value (eq(10)) in MF-AC/MF-Q #12

Open rezunli96 opened 5 years ago

rezunli96 commented 5 years ago

Hi, recently I am trying to reproduce your work and feel a little confused when implementing MF-AC. According to the algorithm at somewhere the MF-Value (10) should be calculated, where it seems it involves many computations to enumerate all possible mean-field actions and their probabilities. I took a look at you MF-AC implementation in battle-game, but it appears to me (please correct me if i am wrong) here the MF-values are substituted with the returns from the sampled trajectory? Could you explain more about how to calculate the MF-value eq(10), for both MF-AC and MF-Q? Thanks

rezunli96 commented 5 years ago

It just occurred to me that the sampled trajectory is an unbiased estimator of the MF-Value? It works for REINFORCE-like AC. But still confused how to calculated for off-policy RL like MF-Q?