Closed peterjc123 closed 7 months ago
Hello, thank you for your interest and detailed comments.
alignment-handbook/zephyr-7b-sft-full
to generation data at iteration 0. For replicating our 50k data generation, simply use UCLA-AGI/SPIN_iter0. The generation program primarily utilizes the provided questions/prompts for model generation. For other data, just convert the data into the same format.Thank you very much for the suggestions! We will make changes accordingly.
@yihedeng9 Thanks for your prompt answer.
2. For iterations, it only depends on which model you are using for data generation. In our experiment setting, it's using
alignment-handbook/zephyr-7b-sft-full
to generation data at iteration 0. For replicating our 50k data generation, simply use UCLA-AGI/SPIN_iter0. The generation program primarily utilizes the provided questions/prompts for model generation. For other data, just convert the data into the same format.
So, you are using the Alpaca's prompt template for data generation with the model alignment-handbook/zephyr-7b-sft-full
to produce UCLA-AGI/SPIN_iter0
, right? I guess it possibly leads to worse initial data compared with that generated using the Zephyr's prompt template. I just want to confirm whether this is a trick or it is okay to use the prompt template of the original model.
The maximum output length is adjustable. We set it to 256 for quicker generation demonstrations, but 512 and 1024 are also viable options. Similarly, the temperature parameter can indeed be modified for efficiency.
Yes, but does this affect the performance of the models?
- Our initial configuration (scheduler, optimizer, and beta) follows the DPO training setup for Zephyr as outlined in the Alignment Handbook (https://github.com/huggingface/alignment-handbook/blob/61a11a5c7d66179ed0a930b0dd12e532fce701dd/recipes/zephyr-7b-beta/dpo/config_full.yaml). While their latest update suggests using Adam and a cosine scheduler, the previous set of configurations are sufficiently effective for our experiments. Nonetheless, we do recommend exploring these settings further for your specific use cases. For our experiences, the two sets of configs work similar in the final performances.
Yes, the results on alignment-handbook/zephyr-7b-sft-full
is similar to the scores in the paper for cosine and AdamW. But it is not working that well using another model and datasets. Could you please give me some advice? (e.g. dataset selection, prompt usage, etc.)
Thanks. I suppose data is the big problem. I will need to find some high quality datasets like ultrachat-200k in my language.
Hi, thanks for open sourcing your work. That's awesome. I have several questions about the details of the implementation.
generate.sh
, it starts with the data preparation work for iter 1.I've also skimmed the code. Here are some potential improvements.
cpu()
andmean()
for faster metrics collection on slower CPUs