Open HCY123902 opened 4 hours ago
Hi @HCY123902
We used the same SFT training script as the original alignment-handbook repo: https://github.com/huggingface/alignment-handbook/blob/main/scripts/run_sft.py
And the command for SFT training is as follows:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simpo.py training_configs/llama-3-8b-base-sft.yaml
As for HuggingFaceH4/ultrachat_200k, we didn't do any specific processing of that. This means we train on all turns if the dialogue is multi-turn.
I hope this helps!
Best, Yu
Thank you for sharing your research work. I have a question related to the supervised fine-tuning step, which, according to the paper, is used to initialize the
base
model before running SimPO. While the SFT configuration file is provided attraining_configs/llama-3-8b-base-sft.yaml
, may I ask for the SFT training script itself?In issue #27, there is 1 comment asking about how
HuggingFaceH4/ultrachat_200k
is processed for SFT. I would like to know this too.HuggingFaceH4/ultrachat_200k
samples are multi-turn dialogues. Therefore, I am curious about what labels are used for SFT.