Evaluate models on story analogies

xinbowu2 commented 5 months ago

Hi, thanks for sharing the codes and datasets. I'd like to know how to evaluate models on story analogies.

taylorwwebb commented 5 months ago

Thanks for your interest in the code. I've now added the code that we used to evaluate GPT-3 on the story analogies and analyze the results. eval_GPT3_story_analogies.py was used to evaluate GPT-3. GPT-3's responses tended to have variable formatting, so analyze_GPT3_story_analogies.py presents each response along with the correct answer and prompts the user to determine whether the answer is correct (rather than attempting to automatically parse the results). At the time we performed the evaluation, GPT-4 was not yet available through the API, so that evaluation was performed manually through the ChatGPT web interface.

Disclaimer: I can't promise that this code will run out of the box. It may need to be updated due to changes in the OpenAI API, and some models may be deprecated.

xinbowu2 commented 4 months ago

Thank you!

xinbowu2 commented 4 months ago

Can I know how I can get the Rattermann.xlsx file used for the story analogies?

taylorwwebb commented 4 months ago

Please see the instructions here https://github.com/taylorwwebb/emergent_analogies_LLM/tree/main/story_analogies

taylorwwebb / emergent_analogies_LLM

Evaluate models on story analogies #2