The old question about accounts opened in the past year is problematic because the answer changes with today's date.
Initial new question: "How many people have opened a savings account between 2022-08-01 and 2023-08-01?"
A naive query only looks at distinct account numbers:
SELECT COUNT(DISTINCT account_number)
FROM savings_account
WHERE account_open_date BETWEEN '2022-08-01' AND '2023-08-01'
34 unique savings account were opened. But we want unique people:
SELECT COUNT(DISTINCT guid)
FROM savings_account
JOIN products ON savings_account.account_number = products.account_number
WHERE account_open_date BETWEEN '2022-08-01' AND '2023-08-01'
32 people!
Update 1:
Ran a comparison between "How many people have opened..." and "How many unique people opened...". Results suggest that the both GPT-4 and GPT-3.5 do better with the second question. Results for 10 runs:
For now, choosing "How many unique people" as the question but it is pretty interesting to note the difference in performance.
Update 2:
Complete results for 10 runs across all systems and questions:
The old question about accounts opened in the past year is problematic because the answer changes with today's date.
Initial new question: "How many people have opened a savings account between 2022-08-01 and 2023-08-01?"
A naive query only looks at distinct account numbers:
34 unique savings account were opened. But we want unique people:
32 people!
Update 1:
Ran a comparison between "How many people have opened..." and "How many unique people opened...". Results suggest that the both GPT-4 and GPT-3.5 do better with the second question. Results for 10 runs:
For now, choosing "How many unique people" as the question but it is pretty interesting to note the difference in performance.
Update 2:
Complete results for 10 runs across all systems and questions: