Open psinger opened 8 months ago
I believe this is necessary for getting the best results when fine-tuning Llama 3, although there seems to be some confusion (https://huggingface.co/meta-llama/Meta-Llama-3-8B/discussions/9).
Just as a follow-up, I implemented a hacky version of this to help with training Llama 3, and indeed adding BOS tokens to prompts and answers when fine-tuning the Llama 8B base model lowered my loss by a small but significant margin (tried many different seeds to ensure it was reproducible)
You can always just hardcode the bos token string to the prompt separator.
Although I am personally not convinced it can have a big impact for finetuning.
We should still add an option to add it.
Yes good point... still seems desirable to have native support in the UX though.
🚀 Feature
Similar to EOS token, we should offer an option to add BOS token to the beginning. Might be useful for models like Gemma.