Closed janetwise closed 1 year ago
Hi Janet,
Nice observation! I think this one is a misnomer though, it should really be called AugmentedCritic
instead since we are taking all states as an input. At some point we did try original MADDPG architecture with actions, but it wasn't as performant as this version, so we decided to use it instead.
Thanks for your quick check! The mamba code didn't perform well on my environment, how should I go debug? The actor loss, model loss, value loss did go down... Anything critical hyper-parameters I should check??
I would advise running grid search on hyper parameters if your have resources since in general it can be quite difficult to pin point which parameters will be optimal for a given environment. In our experience, most critical hyper-parameters were:
Other parameters were actually quite robust across Flatland and Starcraft, so they are less likely to affect the training result. However, you can also take a look at these hyperparameters as well:
Hope this helps!
This is very helpful, thank you so much! I have a very large action space: 100+, my average ep length is 100~200 steps, what should be my seq_length? around 100~200? With my large action_space, what should be the action_hidden and other NN size? Should they be 10x larger than the SMAC or flatland NN size?
My model losses are converging fine but my agent/return isn't going up along with Value/value and value/reward, they all remain nearly constant. What should I look then? My entropy only fluctuates within a very small range.
Could it be a bug that the action is not used in MADDPGCritic.forward(). The actions are in the forward signature but not used.
class MADDPGCritic(nn.Module): def init(self, in_dim, hidden_size, activation=nn.ELU): super().init() self.feedforward_model = build_model(hidden_size, 1, 1, hidden_size, activation) self._attention_stack = AttentionEncoder(1, hidden_size, hidden_size) self.embed = nn.Linear(in_dim, hidden_size) self.prior = build_model(in_dim, 1, 3, hidden_size, activation)