luohongyin / SAIL

SAIL: Search Augmented Instruction Learning
GNU General Public License v3.0
160 stars 14 forks source link

How to evaluate SAIL? #8

Open Luoyang144 opened 8 months ago

Luoyang144 commented 8 months ago

Hello, I am curious about how SAIL was evaluated, and was it evaluated using GPT4? Did all benchmark data be used for evaluation?

luohongyin commented 8 months ago

Hi, thanks for asking! Details about evaluation can be found in https://aclanthology.org/2023.findings-emnlp.242.pdf

Luoyang144 commented 8 months ago

Thanks for reply. In section 3.1, the tile is "Automatic Evaluation with GPT-4", but I didn't see the evaluation details. Have you evaluated the results of all the test data? This will require a significant amount of time (and money).

luohongyin commented 8 months ago

I see, GPT4 is only used for evaluating with Question-80.

More details can be found here but I believe they have upgraded many things since I used it.

Luoyang144 commented 8 months ago

Thanks for reply. I wonder if EM may cause misjudgment during evaluation? This situation seems unavoidable