hpi-sam / rl-4-self-repair

Reinforcement Learning Models for Online Learning of Self-Repair and Self-Optimization
MIT License
0 stars 1 forks source link

Implement Value Function Approximation with Eligibility Traces #13

Open christianadriano opened 4 years ago

christianadriano commented 4 years ago

Use the True Online Sarsa from page 307 in Sutton & Barto.

Similar code here but using a neural network (we won't need a neural network, only a linear function). https://github.com/dariopavllo/mountaincar-sarsa-lambda