Open Randl opened 11 months ago
I've run the training without changing any hyperparameter except for batch size and gradient accumulation steps to match the global batch size on two machines. The first run is exactly as in repo, gets eval loss 1.0667: https://wandb.ai/evgeniizh/huggingface/runs/pskgg48d The second one adds warmup (https://github.com/huggingface/alignment-handbook/pull/31 https://github.com/huggingface/alignment-handbook/pull/71) and uses TRL from master (which fixes https://github.com/huggingface/alignment-handbook/issues/61) and gets eval loss of 1.0927 https://wandb.ai/evgeniizh/huggingface/runs/9ez7kl7s
The official SFT model gets much lower loss of 0.99 https://huggingface.co/alignment-handbook/zephyr-7b-sft-lora
Possibly related to https://github.com/huggingface/alignment-handbook/issues/45
I've run the training without changing any hyperparameter except for batch size and gradient accumulation steps to match the global batch size on two machines. The first run is exactly as in repo, gets eval loss 1.0667: https://wandb.ai/evgeniizh/huggingface/runs/pskgg48d The second one adds warmup (https://github.com/huggingface/alignment-handbook/pull/31 https://github.com/huggingface/alignment-handbook/pull/71) and uses TRL from master (which fixes https://github.com/huggingface/alignment-handbook/issues/61) and gets eval loss of 1.0927 https://wandb.ai/evgeniizh/huggingface/runs/9ez7kl7s
The official SFT model gets much lower loss of 0.99 https://huggingface.co/alignment-handbook/zephyr-7b-sft-lora