Online value function divergence for cql

nakamotoo / Cal-QL

official implementation for our paper Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

https://nakamotoo.github.io/Cal-QL

77 stars 5 forks source link

Online value function divergence for cql #9

Open zhonghai1995 opened 2 months ago

zhonghai1995 commented 2 months ago

Hi, thanks for your work!

When I try cql in the pen binary environment, I find that for cql's value function always tend to diverge (tried mixing ratio 0.0 and 0.5, both for 5 random seeds). The critics give very large estimates, causing it could not make progress during online finetuning. any ideas or suggestions on how to fix this overestimation issue？ I see double critic is already being used. Thanks so much!

Best, Hai

zhonghai1995 commented 2 months ago

I also observate this overestimation issue also exits for cal-ql, which could causes the online performance of cal-ql to degrade W B Chart 8_27_2024, 11_50_37 AM W B Chart 8_27_2024, 11_48_50 AM (1)