pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
https://pytorch.org/rl
MIT License
2.21k stars 290 forks source link

[Feature Request] DDPG with discrete actions #2055

Closed matteobettini closed 5 months ago

matteobettini commented 5 months ago

Since there is a version of SAC for discrete actions https://github.com/pytorch/rl/blob/2461eb20d21b79a410e01aed71c26b77712a30d8/torchrl/objectives/sac.py#L792 I was wondering what would be the process to enable a version of DDPG with discrete actions.

Since the two algorithms are simular my guess is that it should be similar? Another possibility is to emulate continuous actions using the GumbelSoftmax estimator (https://arxiv.org/abs/1611.01144).

cc @ezhang7423

vmoens commented 5 months ago

To me DDPG is the continuous version of DQN (IIRC this is how it was presented to the community). I guess we can make it possible to use GumbelSoftax or other hacks but I've rarely seen any of these tricks truly work in real settings (don't quote me on this I could be wrong). Is there something fundamentally different with DDPG with these sort of things and regular DDPG that would require a new class?

vmoens commented 5 months ago

Closing as non actionable. Happy to reopen if we can identify action items.

ezhang7423 commented 5 months ago

I think the main value-add here is that we found in the BenchMARL repository, IDDPG and MADDPG tend to perform best out of all algorithms. image

However, these algorithms currently don't work for discrete action spaces, which would be nice (specifically meltingpot environments). It also seems that the original MADDPG work did a similar gumbel-softmax trick.

image https://www.reddit.com/r/reinforcementlearning/comments/g5cjzh/ddpg_for_discrete_actions/

vmoens commented 5 months ago

I'm not against anything, I just need an actionable. What's the bug? Where do things break? What did you try but didn't behave as expected?