Addition of the cross-entropy method

On-policy learning with the cross_entropy method Based on the algorithm described in "Deep Reinforcement Learning Hands-On Second Edition" by Maxim Lapan

This is more an exercise for me to get used to the lab rather than a very useful algorithm, but still can be interesting for some ... I guess. To implement this, I have defined a new on-policy called OnPolicyCrossEntropy which inherit from OnPolicyReplay

Experiment Title

Abstract

Small experiment on the cartpole environment (CartPole-v0). As for the REINFORCE baseline spec, we allow 100000, 4 sessions and 1 trial. The training frequency is set to 16 and the cross_entropy coefficient to 0.5

Methodology

REINFORCE algorithm with Cross-Entropy On-Policy

Reproduction

spec file location: slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json
git SHA : f133d4811486425ea1e06724a1e0ca5967396660

Run command: python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cross_entropy_cartpole train

Result and Discussion

reinforce_cross_entropy_cartpole_t0_trial_graph_mean_returns_ma_vs_frames

The result on cartpole is not that good (compare to REINFORCE alone)

Data zipfile url: reinforce_cross_entropy_cartpole_2020_02_14_171517.zip

kengz / SLM-Lab