Closed xinbowu2 closed 5 months ago
Thanks for your interest in the code. I've now added the code that we used to evaluate GPT-3 on the story analogies and analyze the results. eval_GPT3_story_analogies.py
was used to evaluate GPT-3. GPT-3's responses tended to have variable formatting, so analyze_GPT3_story_analogies.py
presents each response along with the correct answer and prompts the user to determine whether the answer is correct (rather than attempting to automatically parse the results). At the time we performed the evaluation, GPT-4 was not yet available through the API, so that evaluation was performed manually through the ChatGPT web interface.
Disclaimer: I can't promise that this code will run out of the box. It may need to be updated due to changes in the OpenAI API, and some models may be deprecated.
Thank you!
Can I know how I can get the Rattermann.xlsx file used for the story analogies?
Please see the instructions here https://github.com/taylorwwebb/emergent_analogies_LLM/tree/main/story_analogies
Hi, thanks for sharing the codes and datasets. I'd like to know how to evaluate models on story analogies.