With experience, we need a list of states, actions and rewards. If we have a model of behaviour too, then we can do Sarsa and vectorize it.
With a list of states and a model... We can use the model to generate the actions and rewards for a given state. From this we can calculate the targets.
We can learn using experience or a model.
With experience, we need a list of states, actions and rewards. If we have a model of behaviour too, then we can do Sarsa and vectorize it.
With a list of states and a model... We can use the model to generate the actions and rewards for a given state. From this we can calculate the targets.