quantumiracle / nash-dqn

Official code of Nash-DQN for paper: Nash-DQN algorithm for two-player zero-sum Markov games, details see our paper: A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games. Zihan Ding, Dijia Su, Qinghua Liu, Chi Jin
17 stars 2 forks source link

suggestion #6

Closed hwz9612 closed 1 year ago

hwz9612 commented 1 year ago

Hi, Recently, I used this nash-dqn-exploiter to train my model. And I rewrite the environment. I set the parameter of max_steps_per_episode is 800. Now I have trained for 6 days and but the model still didn't converge. Can you give me suggestions? image

quantumiracle commented 1 year ago

Hi,

First, please provide more details about the modified environments you used.

From the figure, it seems the return of first player is increasing, which indicates the improvement of its capability. If the Nash equilibrium of the game is at 0 (symmetry for two player), it should converge to 0 value. Obviously, it doesn't reach to that point yet, and needs longer training time.