This is more an exercise for me to get used to the lab rather than a very useful algorithm, but still can be interesting for some ... I guess.
To implement this, I have defined a new on-policy called OnPolicyCrossEntropy which inherit from OnPolicyReplay
Experiment Title
Abstract
Small experiment on the cartpole environment (CartPole-v0).
As for the REINFORCE baseline spec, we allow 100000, 4 sessions and 1 trial.
The training frequency is set to 16 and the cross_entropy coefficient to 0.5
Hi @kengz,
Sorry for the delay,
The cross-entropy method I refer to his the one described here: Cross-Entropy Method ("2- The Cross-Entropy Method for Optimization)
Addition of the cross-entropy method
This is more an exercise for me to get used to the lab rather than a very useful algorithm, but still can be interesting for some ... I guess. To implement this, I have defined a new on-policy called OnPolicyCrossEntropy which inherit from OnPolicyReplay
Experiment Title
Abstract
Small experiment on the cartpole environment (CartPole-v0). As for the REINFORCE baseline spec, we allow 100000, 4 sessions and 1 trial. The training frequency is set to 16 and the cross_entropy coefficient to 0.5
Methodology
REINFORCE algorithm with Cross-Entropy On-Policy
Reproduction
Run command:
python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cross_entropy_cartpole train
Result and Discussion
The result on cartpole is not that good (compare to REINFORCE alone)
Data zipfile url: reinforce_cross_entropy_cartpole_2020_02_14_171517.zip