This is an example implementation of REBASE as described in the paper An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models.
git clone --recurse-submodules git@github.com:thu-wyz/rebase.git
This command will clone our repository with the sglang repository as a submodule. The sglang repository should be on the reward-model branch, which has been modified slightly by us to support our reward model. The benchmark datasets: MATH, GSM8K.
In order to install SGLang and other dependencies:
cd sglang
pip install -e "python[all]"
Our finetuning code for policy models and reward models is based on gpt-accelera You can find the models on huggingface: Llemma-7b, Llemma-34b, Llemma reward model.
You can use tmux to start the servers, or run them in the background by adding & at the end of the scripts. Make sure to set the correct paths on your device.
bash ./scripts/run_policy.sh
bash ./scripts/run_reward.sh
Before starting the REBASE, set the hyperparameters in the YAML file. Then run:
bash ./scripts/rebase.sh
bash ./scripts/evaluate.sh
@misc{wu2024empiricalanalysiscomputeoptimalinference,
title={An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models},
author={Yangzhen Wu and Zhiqing Sun and Shanda Li and Sean Welleck and Yiming Yang},
year={2024},
eprint={2408.00724},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2408.00724},
}