Open roger-creus opened 6 days ago
We have a version of this here https://pytorch.org/rl/stable/reference/generated/torchrl.objectives.DistributionalDQNLoss.html#torchrl.objectives.DistributionalDQNLoss but I don't think we have an official version of Rainbow yet (although this is the first thing we had in the lib - for some reason we never made a script that was high-quality enough to be made public!) LMK if you need further help with it!
I have implemented a first version of Rainbow containing all tricks! (Dueling DQN, Distributional, Prioritized Experience, etc.) and I am now running some preliminary experiments to debug its performance and make sure it works well.
However, I had to change this line to Tz = reward + (1 - terminated.to(reward.dtype)) * discount.unsqueeze(-1) * support.repeat(batch_size, 1)
.
Otherwise I would get shape errors. Let me know if this makes sense!
Is the Distributional Q-Value Actor currently fully supported? If so, are there any plans to integrate C51 and more importantly, Rainbow, to the list of sota-implementations?