UserWarning: KL divergence is starting to become negative: -0.18

kebijuelun commented 1 year ago

hi, thanks for the stackllama great work, I run the rl experiment but met following userwarning, does this matter for rl model training?

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
0%|          | 1/2615 [01:23<60:49:43, 83.77s/it]/data/public/aic/lwz/lwz_code/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.18 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
warnings.warn(
/data/public/aic/lwz/lwz_code/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.07 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
warnings.warn(
0%|          | 2/2615 [02:49<61:32:03, 84.78s/it]/data/public/aic/lwz/lwz_code/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.51 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
warnings.warn(
0%|          | 3/2615 [04:07<59:17:45, 81.72s/it]/data/public/aic/lwz/lwz_code/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.50 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
warnings.warn(
/data/public/aic/lwz/lwz_code/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.15 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
warnings.warn(
0%|          | 6/2615 [08:27<62:31:58, 86.29s/it]/data/public/aic/lwz/lwz_code/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.08 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
warnings.warn(
/data/public/aic/lwz/lwz_code/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.09 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
warnings.warn(
0%|          | 8/2615 [11:17<62:01:16, 85.64s/it]/data/public/aic/lwz/lwz_code/trl/trl/trainer/ppo_trainer.py:1088: UserWarning: KL divergence is starting to become negative: -0.11 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.
warnings.warn(
0%|          | 10/2615 [14:05<61:11:22, 84.56s/it]/root/code/transformers/src/transformers/pipelines/base.py:1080: UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset

the environment I used is , which is showed in https://github.com/lvwerra/trl/issues/343#issuecomment-1537144381
- transformers 4.29.0.dev0
- peft 0.4.0.dev0
- remove the layer_norm_names in model definition fuction

Look forward for your reply. Thanks.

lvwerra commented 1 year ago

That should be fine - only if it continues to become more negative after longer training it is an issue. @younesbelkada maybe we can only warn when it's e.g. <-1?

younesbelkada commented 1 year ago

Sounds good @lvwerra !

huggingface / trl

UserWarning: KL divergence is starting to become negative: -0.18 #351