feature request: batched get_optimistic_exploration_action

microsoft / oac-explore

Code accompanying the paper "Better Exploration with Optimistic Actor Critic" (NeurIPS 2019)

MIT License

68 stars 23 forks source link

Open samuelstanton opened 4 years ago

samuelstanton commented 4 years ago

Would it be straighforward to implement a batched version of get_optimistic_exploration_action?

quanvuong commented 4 years ago

Hi Samuel,

It is doable (a few hours effort), but not straightforward (< 20 minutes).

This is because:

the function get_optimistic_exploration_action requires computing one backward pass and per-sample gradient (and not sum of gradients over a batch of input).
AFAIK, pytorch doesn't support computing per-sample gradient.