Closed matteobettini closed 5 months ago
To me DDPG is the continuous version of DQN (IIRC this is how it was presented to the community). I guess we can make it possible to use GumbelSoftax or other hacks but I've rarely seen any of these tricks truly work in real settings (don't quote me on this I could be wrong). Is there something fundamentally different with DDPG with these sort of things and regular DDPG that would require a new class?
Closing as non actionable. Happy to reopen if we can identify action items.
I think the main value-add here is that we found in the BenchMARL repository, IDDPG and MADDPG tend to perform best out of all algorithms.
However, these algorithms currently don't work for discrete action spaces, which would be nice (specifically meltingpot environments). It also seems that the original MADDPG work did a similar gumbel-softmax trick.
https://www.reddit.com/r/reinforcementlearning/comments/g5cjzh/ddpg_for_discrete_actions/
I'm not against anything, I just need an actionable. What's the bug? Where do things break? What did you try but didn't behave as expected?
Since there is a version of SAC for discrete actions https://github.com/pytorch/rl/blob/2461eb20d21b79a410e01aed71c26b77712a30d8/torchrl/objectives/sac.py#L792 I was wondering what would be the process to enable a version of DDPG with discrete actions.
Since the two algorithms are simular my guess is that it should be similar? Another possibility is to emulate continuous actions using the GumbelSoftmax estimator (https://arxiv.org/abs/1611.01144).
cc @ezhang7423