High Policy Loss in SAC_CQL

waffoo / accel

accelerate reinforcement learning

MIT License

1 stars 1 forks source link

High Policy Loss in SAC_CQL #19

Open waffoo opened 3 years ago

waffoo commented 3 years ago

policy_loss in SAC_CQL is significantly higher than the official implementation when tested with hopper-expert-v0 in d4rl. https://github.com/waffoo/accel/blob/af3f511ea816b2dd80346fe5a0b5e2b395c190ad/accel/agents/sac_cql.py#L261

With the author's implementation, we can get the loss lower than -350, while using accel we can't even reach -300, which leads to slower and unstable learning.

waffoo commented 3 years ago

The cause of unstable performance seems to be because I treat timeout frames as done frames in the preprocessing here. I'll fix the line later, but the high policy loss issue still needs to be investigated.