GPTScore: A Novel Evaluation Framework for Text Generation Models
GPTScore: Evaluate as You Desire
This is the Source Code of Paper: GPTScore: Evaluate as You Desire.
What is GPTScore?
GPTScore is a novel evaluation framework that utilizes the emergent abilities (e.g., zero-shot instruction) of Generative Pre-Trained models to Score generated texts.
GPTScore evaluation framework support:
Customizable. Customized instructions and demonstrations enable the evaluation of new aspects without labeled datasets;
Multifaceted. One evaluator performs multifaceted evaluations;
Training-free.
What PLMs does GPTScore support?
We explored 19 Pre-trained Language Models (PLMs) ranging in size from 80M (FLAN-T5-Small) to 175B (GPT3) to design GPTScore. The PLMs studied in this paper are listed as follows:
Model
Parameter
Evaluator Name
Model
Parameter
Evaluator Name
GPT3
OPT
text-ada-001
350M
gpt3_score
OPT350M
350M
opt350m_score
text-babbage-001
1.3B
gpt3_score
OPT-1.3B
1.3B
opt1_3B_score
text-curie-001
6.7B
gpt3_score
OPT-6.7B
6.7B
opt6_7B_score
text-davinci-001
175B
gpt3_score
OPT-13B
13B
opt13B_score
text-davinci-003
175B
gpt3_score
OPT-66B
66B
opt66B_score
FLAN-T5
GPT2
FT5-small
80M
flan_small_score
GPT2-M
355M
gpt2_medium_score
FT5-base
250M
flan_base_score
GPT2-L
774M
gpt2_large_score
FT5-L
770M
flan_large_score
GPT2-XL
1.5B
gpt2_xl_score
FT5-XL
3B
flan_xl_score
GPT-J-6B
6B
gptJ6B_score
FT5-XXL
11B
flan_xxl_score
Evaluator Name indicates the name of the evaluator corresponding to the Model name in the first column.
Usage
Use the GPT3-based model as the evaluator
Take the evaluation of GPT3-text-curie-001 model as an example.
Setting gpt3_score to True: the GPTScore evaluator uses a GPT3-based PLM.
Setting gpt3model to curie: the text-curie-001 model is utilized.
out_dir_name: set the folder for saving scoring results.
dataname: set the dataset name for evaluation (e.g., BAGEL).
aspect: set the aspect name to be evaluated (e.g., quality).
GPTScore: A Novel Evaluation Framework for Text Generation Models
GPTScore: Evaluate as You Desire
This is the Source Code of Paper: GPTScore: Evaluate as You Desire.
What is GPTScore?
GPTScore is a novel evaluation framework that utilizes the emergent abilities (e.g., zero-shot instruction) of Generative Pre-Trained models to Score generated texts.
GPTScore evaluation framework support:
What PLMs does GPTScore support?
We explored 19 Pre-trained Language Models (PLMs) ranging in size from 80M (FLAN-T5-Small) to 175B (GPT3) to design GPTScore. The PLMs studied in this paper are listed as follows:
Evaluator Name indicates the name of the evaluator corresponding to the Model name in the first column.
Usage
Use the GPT3-based model as the evaluator
Take the evaluation of GPT3-text-curie-001 model as an example.
gpt3_score
to True: the GPTScore evaluator uses a GPT3-based PLM.gpt3model
tocurie
: the text-curie-001 model is utilized.out_dir_name
: set the folder for saving scoring results.dataname
: set the dataset name for evaluation (e.g., BAGEL).aspect
: set the aspect name to be evaluated (e.g., quality).1. GPTScore with Instruction and Demonstration
Set both the
use_demo
anduse_ist
as True.2. GPTScore with only Instruction
Set the
use_ist
to True anduse_demo
to False.3. GPTScore without both Instruction and Demonstration
Set the
use_ist
to False anduse_demo
to False.For more information, visit the GitHub repository.
Suggested labels
None