quantumiracle / MARS

MARS is shortened for Multi-Agent Research Studio, a library for mulit-agent reinforcement learning research.
Apache License 2.0
44 stars 2 forks source link

policy export error #1

Open hwz9612 opened 1 year ago

hwz9612 commented 1 year ago

Hi, When I use nfsp to train my env, I encountered the following problem. RuntimeError: Function 'SoftmaxBackward0' returnen nan values in its 0th output By debugging, I found self.policy(state) outputed 0 in the function of agent.update. Just as the follows show, image

Because the part of output is 0, the value of log_probs is inf. In my environment, the definition of observation_space and action_space is as follows: self.observation_space = spaces.Box(low=0, high=1000, shape=(4,), dtype=np.float32) self.action_space = spaces.Discrete(37) Can you give me some suggestions? Thanks

quantumiracle commented 1 year ago

Sorry for late reply.

It's hard to provide constructive suggestions without having more information about your training progress. NFSP algorithm is using DQN as a base agent and iteratively learns an approximate best response against a set of opponent's historical strategies. Please check if learning a single DQN against a fixed policy also reproduces this error.