Open terkkila opened 10 years ago
A pool of visited state-action-reward tuples that are used in mini-batches to update the value function.
A pool of visited state-action-reward tuples that are used in mini-batches to update the value function.