shariqiqbal2810 / MAAC

Code for "Actor-Attention-Critic for Multi-Agent Reinforcement Learning" ICML 2019
MIT License
676 stars 173 forks source link

How to visualize the attention weights between agents in the testing phase? #10

Closed soada closed 5 years ago

soada commented 5 years ago

Hi, thanks for your great job! I have a question on how to visualize the attention weights between agents in the testing phase, i.e., Figure 6 in your article. Could you please give me some advice? Thank you very much!

shariqiqbal2810 commented 5 years ago

Hi,

You can use this flag (https://github.com/shariqiqbal2810/MAAC/blob/master/utils/critics.py#L162) to return the attention weights of each each agent over the other agents for all the time points that are passed in as input.

soada commented 5 years ago

Thanks for the instructions! But I still have two questions:

  1. Is the flag used for returning the attention weights of the samples collected in the training process?
  2. Can I obtain the attention weights on the fixed time-step in the evaluating process (the decentralized execution process)?
shariqiqbal2810 commented 5 years ago

The flag is simply used whenever you call the forward pass on the critic module. Example:

critic = AttentionCritic(sa_sizes)
rets = critic(return_q=True, return_attend=True)
# rets[0][0] contains Q-value for agent 0 corresponding to inputs and rets[0][1] contains attention weights for agent 0

As such, the attention weights are calculated for whatever states and actions you pass into the critic during the forward pass, so you can calculate the attention weights both during training and execution if you would like.

soada commented 5 years ago

Thanks for your advice! In the execution process, the agents only get observations.

  1. Should I first get the actions from the policies and then send the obs-action pair to the critic to calculate the attention weights? Is there any method to calculate the weights depending only on observations?
  2. As for Figure 6 in your article, when the rover is paired with different towers, are the attention weights calculated in the training process averaged over several times or execution process?
  3. If the attention weights are dynamically changed within an episode, then how to make a visualization? Thanks very much!
shariqiqbal2810 commented 5 years ago
  1. Yes that is correct. The attention weights are calculated as part of the state-action value prediction network, so there is no way to get them without inputting actions.
  2. For Figure 6, the "attention entropy" is reported as an average over all data points in the mini-batch provided during training. It's important to note here that Figure 6 is not plotting the actual attention weights, but rather their entropy (i.e. how uniformly the attention weights are distributed).
  3. You can simply plot the attention weights on a per timepoint basis.
soada commented 5 years ago

Thank you very much! In fact, the figure I have mentioned is the following one (maybe figure 7 in your final version), whose caption is " Attention weights when subjected to different Tower pairings for Rover 1 in Rover-Tower environment ": image Are the attention weights calculated in the training process averaged over several times or execution process?

shariqiqbal2810 commented 5 years ago

Oh I see. These are calculated from a single timepoint during execution.

GoingMyWay commented 4 years ago

Hi,

You can use this flag (https://github.com/shariqiqbal2810/MAAC/blob/master/utils/critics.py#L162) to return the attention weights of each each agent over the other agents for all the time points that are passed in as input.

Hi, sir, is the all_attend_probs[i] the attention weights of agent i or it is the attention weights of other agents except itself?