Open shu65 opened 4 years ago
Possible hypothesis:
n_updates
. Actor-learner can collect data faster, but model updates are not always faster. After 100k steps, n_updates
is 34203 with --actor-learner
and 99001 without --actor-learner
, thus it makes sense the latter performs better.Thus, I'm not sure if it is an issue that needs a fix.
It seems that the score of
train_dqn_gym.py
with--actor-learner
is lower than the baseline score.the result of
train_dqn_gym.py
with--actor-learner
the result of the baseline (without
--actor-learner
)