Closed captify-alapite closed 7 years ago
In the train method of the CEMLearner, there's the following check on line 86-89:
train
CEMLearner
if elite_mean_reward > self.emulator.env.spec.reward_threshold: consecutive_successes += 1 else: consecutive_successes = 0
Unfortunately, the reward_threshold often evaluates to None (e.g. with Pendulum-v0) and consequently the inequality check succeeds, leading to premature halting of the CEM training.
reward_threshold
None
Pendulum-v0
Fixed: de2e6b324ab3c8b6139f898cf0e40357541986ff
In the
train
method of theCEMLearner
, there's the following check on line 86-89:Unfortunately, the
reward_threshold
often evaluates toNone
(e.g. withPendulum-v0
) and consequently the inequality check succeeds, leading to premature halting of the CEM training.