Closed hwz9612 closed 1 year ago
Hi,
First, please provide more details about the modified environments you used.
From the figure, it seems the return of first player is increasing, which indicates the improvement of its capability. If the Nash equilibrium of the game is at 0 (symmetry for two player), it should converge to 0 value. Obviously, it doesn't reach to that point yet, and needs longer training time.
Hi, Recently, I used this nash-dqn-exploiter to train my model. And I rewrite the environment. I set the parameter of max_steps_per_episode is 800. Now I have trained for 6 days and but the model still didn't converge. Can you give me suggestions?