irthomasthomas / undecidability

1 stars 0 forks source link

GPTScore: A Novel Evaluation Framework for Text Generation Models #811

Open ShellLM opened 3 weeks ago

ShellLM commented 3 weeks ago

GPTScore: A Novel Evaluation Framework for Text Generation Models

GPTScore: Evaluate as You Desire

This is the Source Code of Paper: GPTScore: Evaluate as You Desire.

What is GPTScore?

GPTScore is a novel evaluation framework that utilizes the emergent abilities (e.g., zero-shot instruction) of Generative Pre-Trained models to Score generated texts.

GPTScore evaluation framework support:

What PLMs does GPTScore support?

We explored 19 Pre-trained Language Models (PLMs) ranging in size from 80M (FLAN-T5-Small) to 175B (GPT3) to design GPTScore. The PLMs studied in this paper are listed as follows:

Model Parameter Evaluator Name Model Parameter Evaluator Name
GPT3 OPT
text-ada-001 350M gpt3_score OPT350M 350M opt350m_score
text-babbage-001 1.3B gpt3_score OPT-1.3B 1.3B opt1_3B_score
text-curie-001 6.7B gpt3_score OPT-6.7B 6.7B opt6_7B_score
text-davinci-001 175B gpt3_score OPT-13B 13B opt13B_score
text-davinci-003 175B gpt3_score OPT-66B 66B opt66B_score
FLAN-T5 GPT2
FT5-small 80M flan_small_score GPT2-M 355M gpt2_medium_score
FT5-base 250M flan_base_score GPT2-L 774M gpt2_large_score
FT5-L 770M flan_large_score GPT2-XL 1.5B gpt2_xl_score
FT5-XL 3B flan_xl_score GPT-J-6B 6B gptJ6B_score
FT5-XXL 11B flan_xxl_score

Evaluator Name indicates the name of the evaluator corresponding to the Model name in the first column.

Usage

Use the GPT3-based model as the evaluator

Take the evaluation of GPT3-text-curie-001 model as an example.

1. GPTScore with Instruction and Demonstration

Set both the use_demo and use_ist as True.

python score_d2t.py 
--dataname "BAGEL" 
--use_demo True 
--use_ist True 
--gpt3_score True 
--gpt3model "curie" 
--out_dir_name "gpt3Score_based"  
--aspect 'quality'

2. GPTScore with only Instruction

Set the use_ist to True and use_demo to False.

python score_d2t.py 
--dataname "BAGEL" 
--use_demo False 
--use_ist True 
--gpt3_score True 
--gpt3model "curie" 
--out_dir_name "gpt3Score_based"  
--aspect 'quality'

3. GPTScore without both Instruction and Demonstration

Set the use_ist to False and use_demo to False.

python score_d2t.py 
--dataname "BAGEL" 
--use_demo False 
--use_ist False 
--gpt3_score True 
--gpt3model "curie" 
--out_dir_name "gpt3Score_based"  
--aspect 'quality'

For more information, visit the GitHub repository.

Suggested labels

None

ShellLM commented 3 weeks ago

Related content

498 similarity score: 0.9

499 similarity score: 0.89

309 similarity score: 0.89

383 similarity score: 0.89

762 similarity score: 0.89

456 similarity score: 0.89