Open siwei0729 opened 2 years ago
Could you plot the gradient norm as a function of trajectory length for, say, a random policy? We've seen this kind of thing before, and it usually reduces to chaotic/unstable dynamics in the system, so you may need to introduce a truncation length to get stable behavior.
Hi,
I'm working on a project that uses differentiable dynamics. However, for the task halfcheetah, I'm having problems with the gradient explosion. I have created a repo to reproduce this problem.
The problem can be reproduced by using the official implementation analytical policy gradient apg.py with official reward function. The only thing I changed is to print out the gradient norm before clipping.
To reproduce
Environment
nvidia-smi
Gradient norm from Halfcheetah
Gradient norm from ant