Open avnishn opened 3 years ago
@krzentner
was able to fix the contributor's by adding the following argument to the optimizer of trpo:
optimizer_args=dict(hvp_approach=FiniteDifferenceHVP(
base_eps=1e-5))
is there a reason why we would need this? Is this specific to trpo, and if so, can we modify trpo to have this by default?
If CG optimizer can't be used with RNNs (I don't think that's actually the case), we should detect that and raise an error.
the error a contributor got when using the
categoricalgrupolicy
withTRPO
on thetf
branch, computing backwards passes was