might have been calculating the q learning loss wrong

thushv89 / AdaCNN

AdaCNN algorithm. Clean implementation

0 stars 0 forks source link

Open thushv89 opened 6 years ago

thushv89 commented 6 years ago

Previously was using all the actions for a single experience tuple, but seems I should have optimized a single action per single experience tuple.

thushv89 commented 6 years ago

a plot depicting q-values according to new q loss and previous q loss