Open yinyueqin opened 8 months ago
It could be related to the version of lm-evaluation-harness. For more details, see https://github.com/uclaml/SPIN/issues/12#issuecomment-1960974723.
Additionally, after updating the SFT checkpoint from https://huggingface.co/alignment-handbook/zephyr-7b-sft-full, the relative improvement between iteration 0 and iteration 1 appears to be marginal. Are there any new parameter settings being recommended?
I use lm-evaluation-harness v0.4.0 for evaluation, which is consistent with the evaluation version used by the author. In addition, the results displayed above are obtained using num_train_epochs=6 for training.
Hi @yinyueqin Have you reproduced the preformance? I also can not reproduce the preformance
Hi,
Thank you for your work. We're re-evaluating experiments using an updated SFT ckpt from https://huggingface.co/alignment-handbook/zephyr-7b-sft-full and using lm-evaluation-harness v0.4.0 for evaluation. We've noticed a significant performance drop in GSM8k. We trained the model for 6 epochs in each iteration. Have you observed this issue or have insights into potential causes?