C51 agent not working on Pong

michaelnny / deep_rl_zoo

A collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar.

Apache License 2.0

99 stars 8 forks source link

It seems like the C51 agent is not working on Pong, and it can be unstable on CartPole sometimes.

Normally, for DQN like agents using e-greedy policy, we'd expect the agent to make some progress when the 'exploration_epsilon' is less than 0.7, but that's not the case for C51 agent. What makes this issue more interesting is the Rainbow agent works very well on Pong. And the code for Rainbow is almost identical to the C51 agent, except in C51 we don't use noisy layers.

Things we've tried so far, but still didn't solve the issue:

Using the same dueling architecture as in Rainbow
Increase/decrease learning rate
Using small/large frequency to update target Q network
Train much longer than normal DQN, Rainbow

michaelnny / deep_rl_zoo

C51 agent not working on Pong #3