In the code:
the input of sel_ext(query) is state_encodings
the input of k_ext(key) is state_action_encodings
the input of v_ext(value) is state_action_encodings
In the paper, the input of key and query should be state_action_encodings.
I think the correct input should be
the input of sel_ext(query) is state_action_encodings(change)
the input of k_ext(key) is state_action_encodings
the input of v_ext(value) is state_encodings(change)
We output an action-value for each possible action, rather than feeding a specific action as input. This is explained in the section of the paper entitled "Multi-Agent Advantage Function"
In the code: the input of sel_ext(
query
) is state_encodings the input of k_ext(key
) is state_action_encodings the input of v_ext(value
) is state_action_encodings In the paper, the input of key and query should be state_action_encodings.I think the correct input should be the input of sel_ext(
query
) is state_action_encodings(change) the input of k_ext(key
) is state_action_encodings the input of v_ext(value
) is state_encodings(change)Could you explain why this is done in the code?