Updat value function with different action types, why?

shariqiqbal2810 / maddpg-pytorch

PyTorch Implementation of MADDPG (Lowe et. al. 2017)

MIT License

551 stars 128 forks source link

Updat value function with different action types, why? #32

Open tessavdheiden opened 3 years ago

tessavdheiden commented 3 years ago

Hi Shariq,

In your code you update the value function with actions computed by: 1) gumbel_softmax 2) onehot_from_logits

As far as I know, 1) has the gradient attached, while 2) does not.

Why did you implemented it this way?

uhlajs commented 3 years ago

Hi @tessavdheiden, I believe that in one call of update you want to update actor just for one actor (namely for actor attach to agent_i). That is the reason why you send one action sample with gradient attached (representing the action of agent agent_i) and others without.