I'm working on Continued pretraining - Korean + Unsloth.ipynb, using "Llama3_8b.
In preparing data for pretrain, adding EOS_TOKEN doesn't seem to get applied to the loaded wikipedia dataset, although it is defined in "formatting_prompts_func".
Please let me know if this is a mistake or deliberate.
Also, for the instruction finetuning on the same notebook, can I use SFTTrainer used for finetuning instead of UnslothTrainer? Any difference?
Hi,
I'm working on Continued pretraining - Korean + Unsloth.ipynb, using "Llama3_8b. In preparing data for pretrain, adding EOS_TOKEN doesn't seem to get applied to the loaded wikipedia dataset, although it is defined in "formatting_prompts_func". Please let me know if this is a mistake or deliberate.
Also, for the instruction finetuning on the same notebook, can I use SFTTrainer used for finetuning instead of UnslothTrainer? Any difference?
Thanks in advance!