Closed ericl closed 4 years ago
I am able to get the old performance with 0.5.3.
I believe the issue may be that we decoupled sampling speed from training. In old versions of RLlib, sampling was throttled so that the proportion of sample : train samples was kept roughly constant. Hence, the performance now looks much worse sample-wise after https://github.com/ray-project/ray/pull/3212
Ok, this is entirely a false alarm. The following config entirely reproduces previously reported performance:
apex:
env:
grid_search:
- BreakoutNoFrameskip-v4
- BeamRiderNoFrameskip-v4
- QbertNoFrameskip-v4
- SpaceInvadersNoFrameskip-v4
run: APEX
config:
double_q: false
dueling: false
num_atoms: 1
noisy: false
n_step: 3
lr: .0001
adam_epsilon: .00015
hiddens: [512]
buffer_size: 1000000
schedule_max_timesteps: 2000000
exploration_final_eps: 0.01
exploration_fraction: .1
prioritized_replay_alpha: 0.5
beta_annealing_fraction: 1.0
final_prioritized_replay_beta: 1.0
# gpu: true
num_gpus: 1
# APEX
num_workers: 8
num_envs_per_worker: 8
# sample_batch_size: 158
sample_batch_size: 20
train_batch_size: 512
target_network_update_freq: 50000
timesteps_per_iteration: 25000
Breakout:
It looks like Ape-X trains much more slowly now as of at least 0.7.2. For example, it now takes ~80M timesteps to 100 reward on Breakout, which used to be achieved in <10M steps according to the recorded results: https://github.com/ray-project/rl-experiments/tree/master/atari-apex
However, DQN performance seems fine (0.7.5), which suggests it might be some hyperparameter change or issue with computing Ape-X rewards (since you should only take into account rewards from low-epsilon workers). It is also possible it is some other trickier issue.