When I try cql in the pen binary environment, I find that for cql's value function always tend to diverge (tried mixing ratio 0.0 and 0.5, both for 5 random seeds). The critics give very large estimates, causing it could not make progress during online finetuning. any ideas or suggestions on how to fix this overestimation issue? I see double critic is already being used. Thanks so much!
Hi, thanks for your work!
When I try cql in the pen binary environment, I find that for cql's value function always tend to diverge (tried mixing ratio 0.0 and 0.5, both for 5 random seeds). The critics give very large estimates, causing it could not make progress during online finetuning. any ideas or suggestions on how to fix this overestimation issue? I see double critic is already being used. Thanks so much!
Best, Hai