Open Bowen-He opened 2 years ago
Hi, is this the results from Figure 14 in the 2017 paper? These were evaluated with no-ops, while the Dopamine results use sticky actions (IIUC). Hope that helps.
OHHHH, thanks for you reply! Just trying to replicate the training results and investigate the performance gap between DQN and C51. I just checked the c51_icml.gin file used for training, it says "sticky_action = False" in it, so I guess the training results I have should be already based on no-ops. I have the curves for three random seeds here. They've been trained for around 15M steps, which is still far from 200M steps as reported. But I think the curves should have a rapid increase to 500, from which they will climb to 600 gradually. The curves for now show that the agents will get to 400 and seem to reach a plateau from then on. Would you suggest me to wait for more steps to check the results?
Yes, you'll have to wait for the full 200M frames. In this case, there is no no-op evaluation (it's not implemented in Dopamine). no-ops shouldn't make a noticeable difference on final score in most cases.
OK, let me finish up all the training steps to see the final scores! Would you mind if we keep this issue open cause I think it's gonna take some time to finish the training.
Hi, I'm trying to use dopamine to replicate the published result of C51 on breakout, which seems be around 700. However, it looks like my training trails are stuck around 400-500 after a rapid increase of rewards. I'm using the hyper parameter C51_icml.bin, and just changed the game name in the file. Would you please give me some suggestions about where might be wrong?