Closed ericl closed 6 years ago
Also relevant reference: https://github.com/hill-a/stable-baselines
Just ran a "30% full speed" IMPALA across a couple environments. The results are pretty reasonable at 40M frames, with Qbert / Space invaders about inline with results from the A3C paper, and Breakout / Beamrider a bit below. Note that the episode max reward for Breakout and Beamrider are pretty good, but the mean is not quite up there.
I'm guessing we can improve on this with some tuning.
# Runs on a single g3.16xl node
atari-impala:
env:
grid_search:
- BreakoutNoFrameskip-v4
- BeamRiderNoFrameskip-v4
- QbertNoFrameskip-v4
- SpaceInvadersNoFrameskip-v4
run: IMPALA
config:
sample_batch_size: 250 # 50 * num_envs_per_worker
train_batch_size: 500
num_workers: 12
num_envs_per_worker: 5
In what format does it make sense to publish the results? E.g., a collection of full learning curves (e.g., as CSV)? Or actual visualizations like you have above? Or something else?
If we have a public ray perf dashboard, that would be a good place to put these.
Otherwise, I think posting some summary visualizations on github or the docs would do (for example, just having the tuned example yamls with pointers to this issue). The full learning curve data probably isn't that interesting, but we could also upload that to S3 pretty easily.
Do you have any result about A3C or A3C-LSTM?
I did an initial run with A3C, however the results were much worse than the Impala ones. I didn't try tuning the learning rate though as mentioned in the A3C paper.
On Sat, Aug 18, 2018, 11:00 PM luochao1024 notifications@github.com wrote:
Do you have any result about A3C or A3C-LSTM?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/2663#issuecomment-414105799, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA6Shy688eDmu__FkhmWz28hA3ISZ4Bks5uSP8TgaJpZM4V-5mK .
A3C is very sensitive with learning rate as the staleness of gradients increases with learning rate
For reference, here is the run and params (with the default lr=0.0001, and grad_clip=40.0). Note that the gradient magnitude scales with the lr * batch size = 20.
This is also on this branch: https://github.com/ray-project/ray/pull/2679
# Runs on a single m4.16xl node
atari-a3c:
env:
grid_search:
- BreakoutNoFrameskip-v4
- BeamRiderNoFrameskip-v4
- QbertNoFrameskip-v4
- SpaceInvadersNoFrameskip-v4
run: A3C
config:
num_workers: 11
sample_batch_size: 20
optimizer:
grads_per_step: 1000
That PR also adds A2C. Since A2C is deterministic, it should be easy to copy hyperparameters from another A2C implementation to compare results (I'm doing some runs right now, but it might take a while).
you are using 11 workers for experiment. I would recommend 16 workers.
One discovery: we're handling EpisodicLifeEnv resets incorrectly. For example, for BeamRider you get three lives, which we are treating as three episodes, but you're supposed to count as one.
This kind of explains why BeamRider's starting score is about 3x too low.
@luochao1024 this PR reproduces standard Atari results for IMPALA and A2C: https://github.com/ray-project/ray/pull/2700
I'm still having trouble finding the right hyperparams for A3C (vf_explained_var
tends to dive to <0 with A3C whereas it is always close to 1 with A2C / IMPALA), but since it works in A2C it's probably just a matter of tweaking the lr / batch size / grad clipping.
Do you have some right hyperparams that work for a3c now?
I don't have the bandwidth to tune A3C right now, but if you want to give it a shot perhaps starting from the A2C hyperparams with some lr adjustment could work?
On Sat, Aug 25, 2018, 10:37 AM luochao1024 notifications@github.com wrote:
Do you have some right hyperparams that work for a3c now?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/2663#issuecomment-415985069, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA6Srxb0zgTda_k0Mv-fcU5NnWXg4Zvks5uUYtugaJpZM4V-5mK .
@ericl Can you give it a try for BreakoutNoFrameskip-v4? I try a grid search for the lr, but I still get some really bad results. Here is the configs I use:
atari-a3c:
env: BreakoutNoFrameskip-v4
run: A3C
config:
num_workers: 8
sample_batch_size: 20
use_pytorch: false
vf_loss_coeff: 0.5
entropy_coeff: -0.01
gamma: 0.99
grad_clip: 40.0
lambda: 1.0
lr:
grid_search:
- 0.000005
- 0.00001
- 0.00005
- 0.0001
observation_filter: NoFilter
preprocessor_pref: rllib
num_envs_per_workers: 5
optimizer:
grads_per_step: 1000
You'll definitely need to use the deepmind preprocessors, since the rllib knees don't have the right episodic life wrappers. Perhaps we should remove those. Also, maybe don't use LSTM and start from the A2C config.
On Wed, Aug 29, 2018, 9:50 AM luochao1024 notifications@github.com wrote:
@ericl https://github.com/ericl Can you give it a try for BreakoutNoFrameskip-v4? I try a grid search for the lr, but I still get some really bad results. Here is the configs I use:
atari-a3c: env: BreakoutNoFrameskip-v4 run: A3C config: num_workers: 8 sample_batch_size: 20 use_pytorch: false vf_loss_coeff: 0.5 entropy_coeff: -0.01 gamma: 0.99 grad_clip: 40.0 lambda: 1.0 lr: grid_search:
- 0.000005
- 0.00001
- 0.00005
- 0.0001 observation_filter: NoFilter preprocessor_pref: rllib num_envs_per_workers: 5 optimizer: grads_per_step: 1000
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/2663#issuecomment-417024383, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA6Sm2sAAl2Kk3Y5wpeyEY7lc7XYONrks5uVsZlgaJpZM4V-5mK .
Now I am running A3C with the following config:
atari-a3c:
env:
BreakoutNoFrameskip-v4
run: A3C
config:
num_workers: 5
sample_batch_size: 20
preprocessor_pref: deepmind
lr:
grid_search:
- 0.000005
- 0.00001
- 0.00005
- 0.0001
- 0.0005
- 0.001
num_envs_per_worker: 5
optimizer:
grads_per_step: 1000
Do you think the configs are reasonable now? I am also running BeamRiderNoFrameskip-v4, QbertNoFrameskip-v4, SpaceInvadersNoFrameskip-v4 at then same time. I will report it when I finish the training.
There's this one weird thing where num_envs_per_worker will reduce your effective unroll length per env (so 20 / 5 = unroll length of 4). So just watch out for that and you might consider trying 1 env per worker instead, or setting sample_batch_size=50 for a longer unroll.
Beyond that the config looks fine. Note that I found a lr schedule is important for some envs (but it's probably too much to try right now).
On Wed, Aug 29, 2018 at 10:21 AM luochao1024 notifications@github.com wrote:
Now I am running A3C with the following config:
atari-a3c: env: BreakoutNoFrameskip-v4 run: A3C config: num_workers: 5 sample_batch_size: 20 preprocessor_pref: deepmind lr: grid_search:
- 0.000005
- 0.00001
- 0.00005
- 0.0001
- 0.0005
- 0.001 num_envs_per_worker: 5 optimizer: grads_per_step: 1000
Do you think the configs are reasonable now? I am also running BeamRiderNoFrameskip-v4, QbertNoFrameskip-v4, SpaceInvadersNoFrameskip-v4 at then same time. I will report it when I finish the training.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/2663#issuecomment-417034629, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA6SoyKSWaPu8oF1F4Qk6AHp-Tq7SNNks5uVs2GgaJpZM4V-5mK .
The result seems normal now with num_workers=5
BreakoutNoFrameskip-v4:
SpaceInvadersNoFrameskip-v4:
QbertNoFrameskip-v4:
I will set the num_envs_per_worker=1 later
Closing this in favor of individual tickets. Main TODOs are the DQN family.
Describe the problem
We should publish results for at least a few of the standard Atari games on all applicable algorithms, and fix any discrepancies, e.g. https://github.com/ray-project/ray/issues/2654
Results uploaded to this repo: https://github.com/ray-project/rl-experiments
Envs to run: PongNoFrameskip-v4, BreakoutNoFrameskip-v4, BeamRiderNoFrameskip-v4, QbertNoFrameskip-v4, SpaceInvadersNoFrameskip-v4
(Chosen such that all but pong can run concurrently on a g3.16xl machine).
Some references: https://github.com/btaba/yarlp https://github.com/openai/baselines/issues/176