critic value function doesn't use action?

janetwise commented 1 year ago

Could it be a bug that the action is not used in MADDPGCritic.forward(). The actions are in the forward signature but not used.

class MADDPGCritic(nn.Module): def init(self, in_dim, hidden_size, activation=nn.ELU): super().init() self.feedforward_model = build_model(hidden_size, 1, 1, hidden_size, activation) self._attention_stack = AttentionEncoder(1, hidden_size, hidden_size) self.embed = nn.Linear(in_dim, hidden_size) self.prior = build_model(in_dim, 1, 3, hidden_size, activation)

def forward(self, state_features, actions):
    n_agents = state_features.shape[-2]
    batch_size = state_features.shape[:-2]
    embeds = F.elu(self.embed(state_features))
    embeds = embeds.view(-1, n_agents, embeds.shape[-1])
    attn_embeds = F.elu(self._attention_stack(embeds).view(*batch_size, n_agents, embeds.shape[-1]))
    return self.feedforward_model(attn_embeds)

vladimirrim commented 1 year ago

Hi Janet,

Nice observation! I think this one is a misnomer though, it should really be called AugmentedCritic instead since we are taking all states as an input. At some point we did try original MADDPG architecture with actions, but it wasn't as performant as this version, so we decided to use it instead.

janetwise commented 1 year ago

Thanks for your quick check! The mamba code didn't perform well on my environment, how should I go debug? The actor loss, model loss, value loss did go down... Anything critical hyper-parameters I should check??

vladimirrim commented 1 year ago

I would advise running grid search on hyper parameters if your have resources since in general it can be quite difficult to pin point which parameters will be optimal for a given environment. In our experience, most critical hyper-parameters were:

Size of the neural networks in DreamerAgentConfig
- Note the difference between Flatland and Starcraft based on IN_DIM
Length of trajectory for model learning SEQ_LENGTH
- This value correlates with the average trajectory length for the agent, e.g. it shouldn't be much longer than one trajectory
Number of learning epochs for model MODEL_EPOCHS

Other parameters were actually quite robust across Flatland and Starcraft, so they are less likely to affect the training result. However, you can also take a look at these hyperparameters as well:

Learning rates for agent/model
- In case loss doesn't converge
Increasing entropy coefficient
- In case better exploration is required
Changing buffer size
- In case there is catastrofic forgetting
Changing model horizon
- In case model starts hallucinating, i.e. model loss is quite large

Hope this helps!

janetwise commented 1 year ago

This is very helpful, thank you so much! I have a very large action space: 100+, my average ep length is 100~200 steps, what should be my seq_length? around 100~200? With my large action_space, what should be the action_hidden and other NN size? Should they be 10x larger than the SMAC or flatland NN size?

My model losses are converging fine but my agent/return isn't going up along with Value/value and value/reward, they all remain nearly constant. What should I look then? My entropy only fluctuates within a very small range.

jbr-ai-labs / mamba

critic value function doesn't use action? #9