Open samuelstanton opened 4 years ago
Hi Samuel,
It is doable (a few hours effort), but not straightforward (< 20 minutes).
This is because:
get_optimistic_exploration_action
requires computing one backward pass and per-sample gradient (and not sum of gradients over a batch of input).
Would it be straighforward to implement a batched version of
get_optimistic_exploration_action
?