jbr-ai-labs / mamba

This code accompanies the paper "Scalable Multi-Agent Model-Based Reinforcement Learning".
MIT License
46 stars 9 forks source link

critic value function doesn't use action? #9

Closed janetwise closed 1 year ago

janetwise commented 1 year ago

Could it be a bug that the action is not used in MADDPGCritic.forward(). The actions are in the forward signature but not used.

class MADDPGCritic(nn.Module): def init(self, in_dim, hidden_size, activation=nn.ELU): super().init() self.feedforward_model = build_model(hidden_size, 1, 1, hidden_size, activation) self._attention_stack = AttentionEncoder(1, hidden_size, hidden_size) self.embed = nn.Linear(in_dim, hidden_size) self.prior = build_model(in_dim, 1, 3, hidden_size, activation)

def forward(self, state_features, actions):
    n_agents = state_features.shape[-2]
    batch_size = state_features.shape[:-2]
    embeds = F.elu(self.embed(state_features))
    embeds = embeds.view(-1, n_agents, embeds.shape[-1])
    attn_embeds = F.elu(self._attention_stack(embeds).view(*batch_size, n_agents, embeds.shape[-1]))
    return self.feedforward_model(attn_embeds)
vladimirrim commented 1 year ago

Hi Janet,

Nice observation! I think this one is a misnomer though, it should really be called AugmentedCritic instead since we are taking all states as an input. At some point we did try original MADDPG architecture with actions, but it wasn't as performant as this version, so we decided to use it instead.

janetwise commented 1 year ago

Thanks for your quick check! The mamba code didn't perform well on my environment, how should I go debug? The actor loss, model loss, value loss did go down... Anything critical hyper-parameters I should check??

vladimirrim commented 1 year ago

I would advise running grid search on hyper parameters if your have resources since in general it can be quite difficult to pin point which parameters will be optimal for a given environment. In our experience, most critical hyper-parameters were:

Other parameters were actually quite robust across Flatland and Starcraft, so they are less likely to affect the training result. However, you can also take a look at these hyperparameters as well:

Hope this helps!

janetwise commented 1 year ago

This is very helpful, thank you so much! I have a very large action space: 100+, my average ep length is 100~200 steps, what should be my seq_length? around 100~200? With my large action_space, what should be the action_hidden and other NN size? Should they be 10x larger than the SMAC or flatland NN size?

My model losses are converging fine but my agent/return isn't going up along with Value/value and value/reward, they all remain nearly constant. What should I look then? My entropy only fluctuates within a very small range.