Experience Generator Class

We need an experience generator class.

When initiating the value function, we will fit it to the rewards. So the experience generator will have a policy that moves randomly.

When we are generating experience to fit the value function we might step in terms of the maximum reward, as this avoids having to call the value function.

When training online, we might want to do online learning, and choose action using the value function.

jsphon / reinforcement_learning

Experience Generator Class #11