pfnet / pfrl

PFRL: a PyTorch-based deep reinforcement learning library
MIT License
1.18k stars 158 forks source link

ActionValue classes and backward hooks cause an error when used together #95

Closed keisukefukuda closed 1 year ago

keisukefukuda commented 3 years ago

Abstract:

ActionValue classes and backward hooks cause an error when used together.

Details

Several pre-defined models in PFRL, such as FCQuadraticStateQFunction, return ActionValue from its forward function. However, when backward-hook is used, Torch expects that a return value from a forward function is Tensor or dict (of which values() have at least one tensor). If an ActionValue is returned, the loop repeats var = var[0] to the value and finally causes an error like this:

"/.../site-packages/torch/nn/modules/module.py", line 739, in _call_impl
    var = var[0]
  File "/.../pfrl/pfrl/action_value.py", line 316, in __getitem__
    max_action=self.max_action,
  File "/.../pfrl/pfrl/action_value.py", line 267, in __init__
    self.batch_size = self.mu.shape[0]
IndexError: tuple index out of range