huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
9.69k stars 1.22k forks source link

UserWarning: KL divergence is starting to become negative: -0.18 #351

Closed kebijuelun closed 1 year ago

kebijuelun commented 1 year ago

Look forward for your reply. Thanks.

lvwerra commented 1 year ago

That should be fine - only if it continues to become more negative after longer training it is an issue. @younesbelkada maybe we can only warn when it's e.g. <-1?

younesbelkada commented 1 year ago

Sounds good @lvwerra !