A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
MIT License
208
stars
18
forks
source link
SFT with large loss {'loss': 388082722196684.8, 'learning_rate': 0.0, 'epoch': 0.02} #11
The SFT loss is very large and becomes inf then nan at last.
{'loss': 240023.775, 'learning_rate': 0.0002762320648783531, 'epoch': 0.47}
{'loss': inf, 'learning_rate': 0.0002743605739238927, 'epoch': 0.49}
{'loss': nan, 'learning_rate': 0.00027248908296943227, 'epoch': 0.51}
The SFT loss is very large and becomes inf then nan at last. {'loss': 240023.775, 'learning_rate': 0.0002762320648783531, 'epoch': 0.47} {'loss': inf, 'learning_rate': 0.0002743605739238927, 'epoch': 0.49} {'loss': nan, 'learning_rate': 0.00027248908296943227, 'epoch': 0.51}