How to implement MAAC/MFAC for Gaussian Squeezing?

Hi I recently get some confusion when trying to reproduce your work, particular about experiment (1) on gaussian squeezing. According to my understanding in order to implement MAA2C algorithm as described in the DeepMind's NeurIPS 17 paper, the critic network should represent the Q-value function which takes joint action of the players into input. However, it seems that gaussian squeeze task is a stateless environment. According to your implementation details, there is a discount factor \gamma for AC methods but not for Q-learning method. So how do you define the state for gaussian squeezing? And if it is stateless, how can one use A2C methods?

mlii / mfrl

How to implement MAAC/MFAC for Gaussian Squeezing? #11