Open kibitzing opened 5 months ago
Annotators...
In addition to preference, we also annotate "absolute safety" with three categories:
We do not include any examples from category 4, as we believe safer responses will also be better/preferred by humans.
Human annotations were collected every week (by batch)
As we collected more preference data, our reward models improved, and we were able to train progressively better versions for Llama 2-Chat.
It is important to gather new preference data using the latest Llama 2-Chat iterations before starting a new tuning iteration.
We collected a large dataset of over 1 million binary comparisons based on humans applying our specified guidelines
Note that the number of tokens in prompts and answers differs depending on the text domain.
Meta data: more conversation turns and longer on average.
Helpfulness reward model is eventually trained on all Meta Helpfulness data, combined with an equal parts of the remaining data uniformly sampled from Meta Safety and from the open-source datasets.
SFT data
SFT data quality check
To validate our data quality, we carefully examined a set of 180 examples, comparing the annotations provided by humans with the samples generated by the model through manual scrutiny.
Sometimes, Model output quality > human handwritten output quality