openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Other
15.04k stars 2.61k forks source link

Idea for Evals: GPT matches lyrics with song name #390

Closed Ein-Tim closed 6 days ago

Ein-Tim commented 1 year ago

Eval description

An eval checking whether GPT-3.5 & GPT-4 can accurately match given lyrics to the song name.

Problem and motivation

After some testing with GPT-3.5 & GPT-4 (via ChatGPT+) I was honestly disappointed at how bad the LLM performed on these tasks, especially because a simple Google search of the lyrics nearly always brought up the correct song.

Examples

Example one: ❌❌

Input: From which song are the following lyrics „I’ve been drinking too much, for sure“?
Answer (GPT-3.5-turbo): These lyrics are from the song "Blinding Lights" by The Weeknd.
Correct answer: "Hangover"
Input: From which song are the following lyrics „I’ve been drinking too much, for sure“?
Answer (GPT-4): The lyrics "I've been drinking too much, for sure" are from the song "Roses" by The Chainsmokers featuring Rozes. The song was released in 2015 and became a big hit for the electronic music duo.
Correct answer: "Hangover"

Example two: ❌❌

Input: From which song are the following lyrics „Poured the old ginger beer down the sink“?
Answer (GPT-3.5-turbo): The lyrics "Poured the old ginger beer down the sink" are from the song "Being for the Benefit of Mr. Kite!" by The Beatles.
Correct answer: "Supermarket Flowers"
Input: From which song are the following lyrics „Poured the old ginger beer down the sink“?
Answer (GPT-4): The lyrics "Poured the old ginger beer down the sink" are from the song "Wings" by Birdy. This song is part of her second studio album "Fire Within," which was released in 2013. "Wings" is one of Birdy's popular songs and showcases her unique voice and songwriting skills.
Correct answer: "Supermarket Flowers"

Example three: ❌✅

Input: From which song are the following lyrics „I’m criticized, but all your bullets ricochet“?
Answer (GPT-3.5-turbo): The lyrics "I'm criticized, but all your bullets ricochet" are from the song "Roar" by Katy Perry.
Correct answer: "Titanium"
Input: From which song are the following lyrics „I’m criticized, but all your bullets ricochet“?
Answer (GPT-4): The lyrics "I'm criticized, but all your bullets ricochet" are from the song "Titanium" by David Guetta featuring Sia. The song was released in 2011 and became a big hit, showcasing Sia's powerful vocals and David Guetta's production skills.
Correct answer: "Titanium"

Example eval prompt

{"input": [{"role": "system", "content": "You are LyricsGPT. A helpful AI chatbot that is given lyrics from songs and you will respond only with the name of the song the lyrics are coming from. Don't explain your choice. Don't add the artist(s) of the song to your answer."}, {"role": "user", "content": "From which song are the following lyrics „I’m criticized, but all your bullets ricochet“?"}], "ideal": ["Titanium"]}

Is this something you're interested in working on

I'd really like to provide this eval however I currently neither have the time to do so nor sufficient technical skills. Thus, I'm sharing my idea here and hope that someone will provide a PR based on this idea!

Ein-Tim commented 1 year ago

Converted to a discussion: https://github.com/openai/evals/discussions/391

Ein-Tim commented 1 year ago

As written by @andrew-openai in https://github.com/openai/evals/issues/632, I'm reopening this issue and have changed the title accordingly. Please also apply the https://github.com/openai/evals/labels/Idea%20for%20Eval label to it.