Parameters for Breakout

MatheusMRFM commented 7 years ago

Hello there!

I tried to implement my own version of the A3C using tensorflow (here), but ended up not getting good results. Thus, I used the same network architecture as this implementation (universe starter agent) to see if it would change the results. Initially, I thought that the default convolutional layers from tensorflow (tensorflow.contrib.layers) was the responsible. I then used the same convolution function used here, but to no avail....I have already checked the flow of my code and compared it to universe starter agent, and found them to be the same.

The environment that is giving me problems is Breakout. For Pong for example, my code (with the current parameters) works very well. But when I try it with the Breakout, I can't get past the score of 40...I have already tried several parameters (different network architecture, learning rates, frame skipping), but still no success. Has anyone tried this code for Breakout? What parameters did you use? Since I have limited computational power, it is hard for me to make several tests, which forced me to post this question.

Thank you all!

AdamStelmaszczyk commented 6 years ago

Has anyone tried this code for Breakout? What parameters did you use?

I used the default parameters from the code, gamma 0.99, lambda 1.0, learning rate 1e-4, gradient clip 40.

I also can't reproduce the results for BreakoutDeterministic-v4 nor SeaquestDeterministic-v4.

Breakout after 24h with 16 workers (3 independent trainings): breakout-1

breakout-2

breakout-3

However: https://github.com/openai/universe-starter-agent/issues/87#issuecomment-294513932

Each worker requires 2-3 cores.

The machine had 24 cores. So, I did run it again for 14h with 8 workers (3 independent trainings):

breakout-1

breakout-2

breakout-3

And A3C (16 workers) from https://arxiv.org/pdf/1602.01783.pdf (page 5) looks better:

goal

AdamStelmaszczyk commented 6 years ago

I realized something, in the above A3C paper:

Specifically, we tuned hyperparameters (learning rate and amount of gradient norm clipping) using a search on six Atari games (Beamrider, Breakout, Pong, Q*bert, Seaquest and Space Invaders) and then fixed all hyperparameters for all 57 games.

Unfortunately, it seems that it's not written what the learning rate and gradient norm clipping values were. But they could be different than the default ones used in the code here.

choinker commented 6 years ago

Has anyone found optimal hyperparameters?

AdamStelmaszczyk commented 6 years ago

I haven't (tried a bit), but this is helpful:

I guess universe-starter-agent has correct implementation of A3C but definitely with quite a few design changes, e.g., unshared optimizer across workers, different hyper-parameters like input size, learning rate etc., and different network architectures etc. I first "tuned" it to make sure I can reproduce ATARI results to some extent (note: it's quite hard to replicate original paper results because they use Torch and initialization was different -- training is sensitive). I could reach close to the results for "breakout" and few other games in "non-shared optimizer scenario" (see original A3C paper supplementary) but did not get exactly same numbers because of difference in initialization, Tensorflow vs. Torch etc. By the word "tuning" above I meant: changing architecture, changing loss equation to mean loss and not the total loss, changing hyper-parameters etc.

Here are the hyperparams for the original A3C work, but for universe-starter-agent they would be different, because of the significant implementation differences.

Seems possible to find "working" ones for universe-starter-agent, but it requires good effort.

By the way, be aware that universe and universe-starter-agent seem deprecated: https://github.com/openai/universe/issues/218.

openai / universe-starter-agent

Parameters for Breakout #125