Note (October 15, 2024): Cleaning up this code is a work in progress. This repo will be regularly updated over the next several weeks. Please reach out if you are trying to use it and encounter problems.
This code runs the experiments in RATE: Score Reward Models with Imperfect Rewrites of Rewrites.
TODO: Summary of the paper and experiments
First, install the conda environment located in the root of the directory:
conda env create -f environment.yaml
If you plan to run the scripts directly, activate the environment (If you plan to use the Make commands, the Make command will manage the environment for you):
conda activate rate
Create a .env
file located in the root of the directory with three things:
users
is fine)Here's an example .env
file:
OPENAI_API_KEY=yOuRoPeNaIaPiKeY123
PROJECT_DIR=path/to/where/the/data/will/save
GROUP_NAME=users
PARTITION=general
TODO: Where does stuff actually save? What directories does this create?
There are four key parts of this experiment:
Each of these can be run separately--see the appropriate section for more detailed instructions.
We use a single config file for all of the settings to run experiments. The default config file is located at experiments/config.yaml
. (Note: You can make additional config files for different experiment setups, which is necessary if you plan to schedule multiple experiments via SLURM).
The config file is broken up into the three "phases" of the experiment and the settings for each are separate.
Details on the settings for each experiment are in their section, but here is an example config file for the IMDB dataset, rewriting on the concept "length" and scoring using the ArmoRM reward model.
smoke_test: true
rewrites:
dataset_name: "imdb_length" # This is used in a factory function to import the dataset template, must match a template in dataset_templates/
scoring:
model: "armorm" # Choices: "distilbert_positive", "distilbert_negative", "deberta", "armorm", "sfairxc", "ncsoft"
dataset_folder: "scored" # Choices: "rewrites", "scored"
dataset_name: "imdb_length" # Note: used in filename so update to match the dataset filename below (INCLUDE CONCEPT)
dataset_filename: "archive/imdb_length_sfairxc_scored_20240918_195038.jsonl"
effects:
dataset_name: "imdb" # Note: this is used to create the filename for the calculated effects
concept: "length"
score: "armorm"
reward_key: "ArmoRM" # Note: This is the key for the reward in the dataset
dataset_filename: "imdb_sentiment_complete_scored_20240919_152739.jsonl"
If you set smoke_test: true
in your config file, the experiments will run with smaller datasets to avoid wasting API calls or compute resources. Check the __main__
block in each script to review the smoke test limits
Logs are created in logs/
when running each script with Make commands.
Logging is configured in experiments/constants.py
.
To generate datasets, we go through the following process:
Each dataset is managed by a dataset template that lives in dataset_templates
.
The dataset template specifies the following:
lambda
function on each example in the dataset or some other form of classifierAfter running this script, the completed dataset and the intermediate files used for API submission will be saved to a data
folder in the project directory specified in your .env
file.
Here is an example dataset template.
Key things to notice:
load_dataset
.from datasets import load_dataset
dataset = load_dataset("imdb")
dataset_template = {
"dataset_name": "imdb_length",
"original_completions": dataset["train"],
"w_strings": {
"w=1": "longer",
"w=0": "shorter",
}, # Note: if w=1 for the original completion, rewrite using w=0
"w_classifier": lambda x: len(x) > 970,
"get_original_completion": lambda x: x["text"],
"reward_question": lambda x: "Write a movie review: ", # Note: The reward models need a prompt to score completions
"rewrite_prompt": """{original_completion}\n\n[Adjust this review so it's {w_counterfactual_string}, but change *nothing* else.""",
"model": "gpt-4o-2024-08-06",
"temperature": 0.7,
}
Caution: Ensure that no Conda environment (other than the default
base
) is active before running Make commands. If you have another environment active, deactivate it with:conda deactivate
The Makefile will automatically manage the Conda environment for you during the job execution.
When you have created a template for your dataset (and updated the config yaml file with the appropriate settings), you can schedule this as a SLURM job using Make:
make create_dataset
The datasets we used in our experiments are:
To score datasets, we need to update the config.yaml
file and make sure we've created a scoring template for our reward model.
These are the relevant fields in the yaml file. Make sure the model
field aligns with the name of the scoring template. dataset_folder
and dataset_filename
specify where the saved dataset is that you want to score. dataset_name
is used when creating the scored output file.
scoring:
model: "armorm" # Choices: "distilbert_positive", "distilbert_negative", "armorm", "sfairxc", "ncsoft"
dataset_folder: "scored" # Choices: "rewrites", "scored"
dataset_filename: "archive/imdb_length_sfairxc_scored_20240918_195038.jsonl"
dataset_name: "imdb_length" # Note: used in output filename so update to match the dataset filename below (INCLUDE CONCEPT)
We define a scoring template for each reward model in scoring_templates/
. reward_model
, reward_tokenizer
and _score_example
are imported via a factory function.
Make sure that the model name in your config.yaml
and the file name in scoring_templates/
match.
This is an example scoring template for the ArmoRM model.
import torch
from constants import DEVICE
from torch.cuda.amp import autocast
from transformers import AutoModelForSequenceClassification, AutoTokenizer
reward_model_path = "RLHFlow/ArmoRM-Llama3-8B-v0.1"
reward_model = AutoModelForSequenceClassification.from_pretrained(
reward_model_path, trust_remote_code=True
).to(DEVICE)
reward_tokenizer = AutoTokenizer.from_pretrained(reward_model_path)
def _score_example(
model,
tokenizer,
question,
answer,
device=DEVICE,
truncation=True,
):
messages = [
{"role": "user", "content": question},
{"role": "assistant", "content": answer},
]
with torch.no_grad():
with autocast():
model = model.to(device)
inputs = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
padding=True,
truncation=truncation,
).to(device)
outputs = model(inputs)
reward = outputs.score.float().item()
del inputs, outputs # Explicitly free up memory to prevent OOM
return reward
There is a make command that will allow you to schedule a scoring job using SLURM.
Note: You can pass a config file to the Make command (which is useful for scheduling multiple jobs with different configurations via SLURM). If you don't pass a value, this defaults to config.yaml
.:
make score_dataset CONFIG=custom_config.yaml
After scoring a dataset, we calculate the average treatment effect using our rewritten counterfactual examples.
This defaults to calculating the effect size between the rewritten rewrite and the rewrite.
Here is an example setup in config.yaml
— specify the key for the saved reward in the dataset and the dataset_filename.
effects:
dataset_name: "imdb" # Note: this is used to create the filename for the calculated effects
concept: "length"
score: "armorm"
reward_key: "ArmoRM" # Note: This is the key for the reward in the dataset
dataset_filename: "imdb_sentiment_complete_scored_20240919_152739.jsonl"
You can use a make command for calculating treatment effects on SLURM-based systems.
make treatment_effect
TODO
TODO