prolego-team / neo-sophia

Applying the latest advancements in AI and machine learning to solve complex business problems.
BSD 3-Clause "New" or "Revised" License
66 stars 26 forks source link

fix bank agent eval question #94

Closed bdzimmer closed 1 year ago

bdzimmer commented 1 year ago

The old question about accounts opened in the past year is problematic because the answer changes with today's date.

Initial new question: "How many people have opened a savings account between 2022-08-01 and 2023-08-01?"

A naive query only looks at distinct account numbers:

SELECT COUNT(DISTINCT account_number)
FROM savings_account
WHERE account_open_date BETWEEN '2022-08-01' AND '2023-08-01'

34 unique savings account were opened. But we want unique people:

SELECT COUNT(DISTINCT guid)
FROM savings_account
JOIN products ON savings_account.account_number = products.account_number
WHERE account_open_date BETWEEN '2022-08-01' AND '2023-08-01'

32 people!

Update 1:

Ran a comparison between "How many people have opened..." and "How many unique people opened...". Results suggest that the both GPT-4 and GPT-3.5 do better with the second question. Results for 10 runs:

image

For now, choosing "How many unique people" as the question but it is pretty interesting to note the difference in performance.

Update 2:

Complete results for 10 runs across all systems and questions:

image

image