microsoft / oac-explore

Code accompanying the paper "Better Exploration with Optimistic Actor Critic" (NeurIPS 2019)
MIT License
68 stars 23 forks source link

feature request: batched get_optimistic_exploration_action #3

Open samuelstanton opened 4 years ago

samuelstanton commented 4 years ago

Would it be straighforward to implement a batched version of get_optimistic_exploration_action?

quanvuong commented 4 years ago

Hi Samuel,

It is doable (a few hours effort), but not straightforward (< 20 minutes).

This is because:

  1. the function get_optimistic_exploration_action requires computing one backward pass and per-sample gradient (and not sum of gradients over a batch of input).
  2. AFAIK, pytorch doesn't support computing per-sample gradient.