Open thushv89 opened 6 years ago
Previously was using all the actions for a single experience tuple, but seems I should have optimized a single action per single experience tuple.
a plot depicting q-values according to new q loss and previous q loss
Previously was using all the actions for a single experience tuple, but seems I should have optimized a single action per single experience tuple.