Q Overestimation - Githubissues

twni2016 / pomdp-baselines

Simple (but often Strong) Baselines for POMDPs in PyTorch, ICML 2022

https://sites.google.com/view/pomdp-baselines

MIT License

307 stars 42 forks source link

Q Overestimation #16

Open smorad opened 1 year ago

smorad commented 1 year ago

I'm rerunning velocity baselines in the POMDP directory and I'm observing exploding Q values fairly often. I was wondering if this is something you experienced during training. TD3 seems to avoid overestimation bias but the returns seem low. Any tips to get more stable returns across trials without massive batch sizes?

twni2016 commented 1 year ago

Yes, I found overestimation and also gradient explosion when training LSTM TD3 in some hard environments like Walker-V. A simple remedy may be add gradient clipping to avoid explosion, although I don't expect this can fix the issue.