Reproduce C51 training reuslts published by the paper

google / dopamine

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

https://github.com/google/dopamine

Apache License 2.0

10.42k stars 1.36k forks source link

Reproduce C51 training reuslts published by the paper #186

Open Bowen-He opened 2 years ago

Bowen-He commented 2 years ago

Hi, I'm trying to use dopamine to replicate the published result of C51 on breakout, which seems be around 700. However, it looks like my training trails are stuck around 400-500 after a rapid increase of rewards. I'm using the hyper parameter C51_icml.bin, and just changed the game name in the file. Would you please give me some suggestions about where might be wrong?

mgbellemare commented 2 years ago

Hi, is this the results from Figure 14 in the 2017 paper? These were evaluated with no-ops, while the Dopamine results use sticky actions (IIUC). Hope that helps.

Bowen-He commented 2 years ago

OHHHH, thanks for you reply! Just trying to replicate the training results and investigate the performance gap between DQN and C51. I just checked the c51_icml.gin file used for training, it says "sticky_action = False" in it, so I guess the training results I have should be already based on no-ops. I have the curves for three random seeds here. They've been trained for around 15M steps, which is still far from 200M steps as reported. But I think the curves should have a rapid increase to 500, from which they will climb to 600 gradually. The curves for now show that the agents will get to 400 and seem to reach a plateau from then on. Would you suggest me to wait for more steps to check the results? Screenshot from 2021-10-29 13-51-34

mgbellemare commented 2 years ago

Yes, you'll have to wait for the full 200M frames. In this case, there is no no-op evaluation (it's not implemented in Dopamine). no-ops shouldn't make a noticeable difference on final score in most cases.

Bowen-He commented 2 years ago

OK, let me finish up all the training steps to see the final scores! Would you mind if we keep this issue open cause I think it's gonna take some time to finish the training.