rlworkgroup / garage

A toolkit for reproducible reinforcement learning research.
MIT License
1.86k stars 309 forks source link

update documentation on how to use rnns with tf/torch[pending] #2200

Open avnishn opened 3 years ago

avnishn commented 3 years ago

the error a contributor got when using the categoricalgrupolicy with TRPO on the tf branch, computing backwards passes was

tensorflow.python.framework.errors_impl.InvalidArgumentError: Node 'optimize/hx_plain/gradients_hx_plain/ConjugateGradientOptimizer/update_opt_mean_kl/gradients_constraint/policy_1/gru/rnn_2/while_grad/policy_1/gru/rnn_2/while_grad_grad/ConjugateGradientOptimizer/update_opt_mean_kl/gradients_constraint/policy_1/gru/rnn_2/while_grad/policy_1/gru/rnn_2/while_grad_grad': 
Connecting to invalid output 78 of source node ConjugateGradientOptimizer/update_opt_mean_kl/gradients_constraint/policy_1/gru/rnn_2/while_grad/policy_1/gru/rnn_2/while_grad which has 78 outputs. 

Try using tf.compat.v1.experimental.output_all_intermediates(True)
avnishn commented 3 years ago

@krzentner

avnishn commented 3 years ago

was able to fix the contributor's by adding the following argument to the optimizer of trpo:

optimizer_args=dict(hvp_approach=FiniteDifferenceHVP(
                            base_eps=1e-5))

is there a reason why we would need this? Is this specific to trpo, and if so, can we modify trpo to have this by default?

ryanjulian commented 3 years ago

If CG optimizer can't be used with RNNs (I don't think that's actually the case), we should detect that and raise an error.