[Feature Request] Support for distributional-DQNalgorithms (C51, Rainbow)

pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

https://pytorch.org/rl

MIT License

2.01k stars 269 forks source link

[Feature Request] Support for distributional-DQNalgorithms (C51, Rainbow) #2269

Open roger-creus opened 6 days ago

roger-creus commented 6 days ago

Is the Distributional Q-Value Actor currently fully supported? If so, are there any plans to integrate C51 and more importantly, Rainbow, to the list of sota-implementations?

vmoens commented 5 days ago

We have a version of this here https://pytorch.org/rl/stable/reference/generated/torchrl.objectives.DistributionalDQNLoss.html#torchrl.objectives.DistributionalDQNLoss but I don't think we have an official version of Rainbow yet (although this is the first thing we had in the lib - for some reason we never made a script that was high-quality enough to be made public!) LMK if you need further help with it!

roger-creus commented 5 days ago

I have implemented a first version of Rainbow containing all tricks! (Dueling DQN, Distributional, Prioritized Experience, etc.) and I am now running some preliminary experiments to debug its performance and make sure it works well.

However, I had to change this line to Tz = reward + (1 - terminated.to(reward.dtype)) * discount.unsqueeze(-1) * support.repeat(batch_size, 1).

Otherwise I would get shape errors. Let me know if this makes sense!