jackaduma / Vicuna-LoRA-RLHF-PyTorch

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
MIT License
208 stars 18 forks source link

SFT with large loss {'loss': 388082722196684.8, 'learning_rate': 0.0, 'epoch': 0.02} #11

Open LeiShenVictoria opened 1 year ago

LeiShenVictoria commented 1 year ago

The SFT loss is very large and becomes inf then nan at last. {'loss': 240023.775, 'learning_rate': 0.0002762320648783531, 'epoch': 0.47} {'loss': inf, 'learning_rate': 0.0002743605739238927, 'epoch': 0.49} {'loss': nan, 'learning_rate': 0.00027248908296943227, 'epoch': 0.51}