How to calculate MF-Value (eq(10)) in MF-AC/MF-Q

Hi, recently I am trying to reproduce your work and feel a little confused when implementing MF-AC. According to the algorithm at somewhere the MF-Value (10) should be calculated, where it seems it involves many computations to enumerate all possible mean-field actions and their probabilities. I took a look at you MF-AC implementation in battle-game, but it appears to me (please correct me if i am wrong) here the MF-values are substituted with the returns from the sampled trajectory? Could you explain more about how to calculate the MF-value eq(10), for both MF-AC and MF-Q? Thanks

mlii / mfrl

How to calculate MF-Value (eq(10)) in MF-AC/MF-Q #12