noamoss commented 1 year ago

Practice a way to provide gpt3.5turbo a claim + CSV file. Expected outputs:

csv scheme (SQLite)
query to validate the claim based on the csv scheme. Possible outputs:
- True (claim validated by query) + relevant data
- False (claim) + relevant data
- Could not be answered
query result

cases

3 claims that could be approved by the provided csv
3 claims that could be disapproved by the provided csv
3 claims that could not be answered given the provided csv

noamoss commented 1 year ago

@akariv please review and approve (or add relevant info) before I start working on this.

akariv commented 1 year ago

The CSV scheme & the claim are the inputs to GPT, the expected output is the sqlite query that should return the result.

A few things (edge cases?), maybe for later:

What if the claim is mostly correct - e.g. claim is "unemployment grew by 55 thousand people", query result is "55,214" - do we mark it as true or false?
What if we need multiple sql queries to answer the claim? Can we avoid this situation?

Finally, we would like to aggregate the individual results (true/false/data etc.) for the individual claims and generate a verification text for the original claim.

noamoss commented 1 year ago

@akariv @OriHoch

I would be glad for your thoughts here. I pushed a POC of generating an SQLite scheme and locating the header.

While trying to figure out the 2nd part of this task (transforming an open text query into a SQLite query for an unknown table structure), I thought of two directions.

I will use the following inputs to explain them:

Example SQLite scheme: "CREATE TABLE example_table ("_id" INTEGER, "Quarter (Q)" TEXT, "Year-Month" TEXT, "Total Appeals Received" INTEGER, "Director Appeals" INTEGER, "Employee Appeals" INTEGER, "Employer Appeals" INTEGER, "Notice of Contravention Appeals" INTEGER, "Other Appeals" INTEGER);"
Example claim: There was a decline in employee appeals in the year 2014 compared to the previous years.

Method A - generate ad-hoc query

The advantage is that it is a relatively simple and fast process. The disadvantages concern the claim that many details and contexts need to be added to this prompt to allow the LLM to know whether he can answer. It is also problematic in cases of ambiguities (for example, "decline in 2014..." could be understood in at least two ways - and this method does not allow us to clarify the meaning of the claim.

import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.ChatCompletion.create(
  model="gpt-4",
  messages=[
    {
      "role": "system",
      "content": "Act as a professional researcher. Given the following SQLite scheme, help to write the SQLite query to verify the subsequent claim. \n\n- Write the SQLite query without notes if the claim can be verified fully. Status: 'ok'.\n- If only part of the claim could be verified, provide the relevant query and add a note to explain what could not be verified. Status: 'partially'.\n- If the provided table cannot verify the claim, return an string as query, and an explanation as a note. Status: 'Failed'.\n\nReview your output before showing it and ensure you did everything you can to write a query to verify the claim.\n\nYour output should be just a tuple of 3 elements: result, query and notes.\n\n# expected output\n```python\n(<result>,<query>, <notes>)\n```"
    },
    {
      "role": "user",
      "content": "## Claim\nThere was a decline in employee appeals in the year 2014 compared to the previous years.\n\n## SQLite scheme \nCREATE TABLE example_table (\"_id\" INTEGER, \"Quarter (Q)\" TEXT, \"Year-Month\" TEXT, \"Total Appeals Received\" INTEGER, \"Director Appeals\" INTEGER, \"Employee Appeals\" INTEGER, \"Employer Appeals\" INTEGER, \"Notice of Contravention Appeals\" INTEGER, \"Other Appeals\" INTEGER);"
    },
    {
      "role": "assistant",
      "content": "('partially', \n\"\"\"\nSELECT \"Year-Month\", SUM(\"Employee Appeals\") as Total_Employee_Appeals\nFROM example_table\nGROUP BY \"Year-Month\"\nORDER BY \"Year-Month\";\n\"\"\",\n'The query will provide the total number of employee appeals for each month-year. However, without knowing the specific years to compare with 2014, it is not possible to fully verify the claim.')"
    }
  ],
  temperature=0,
  max_tokens=4395,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)

example output

('partially', 
"""
SELECT "Year-Month", SUM("Employee Appeals") as Total_Employee_Appeals
FROM example_table
GROUP BY "Year-Month"
ORDER BY "Year-Month";
""",
'The query will provide the total number of employee appeals for each month-year. However, without knowing the specific years to compare with 2014, it is not possible to fully verify the claim.')

Method B - predefine potential questions per table in advance

by using the openai api to generate a list/dictionary of potential parameterized questions, with the relevant queries, as a first step. The advantage is that it is a very unambiguous process. The disadvantages concern process complexity (adding steps and additional information that needs to be stored and retrieved) and a particular potential for error if the model did not define a question that could be answered in advance.

Step 1:

Given the following SQLite scheme, list all the questions that could be answered in a natural language style.  Use parameters (like <year> or <type>) instead of replicating similar questions on different values (like 2020, 2019, 2018...).

Verify you included all potential questions in your output.
For each question generate the relevant query.

Output example:
```python 
[(question, sqlite query), 
 (question, sqlite query), 
 (question, sqlite query), 
...
]```

## SQLite scheme
{{sqlite scheme}}

output:

# Generating the SQLite queries for each of the questions

queries = [
    # 1. What is the total number of appeals received in <year>?
    ("What is the total number of appeals received in <year>?",
     "SELECT SUM(`Total Appeals Received`) FROM example_table WHERE `Year-Month` LIKE '<year>-%'"),

    # 2. How many director appeals were there in <year>?
    ("How many director appeals were there in <year>?",
     "SELECT SUM(`Director Appeals`) FROM example_table WHERE `Year-Month` LIKE '<year>-%'"),

    # 3. How many employee appeals were made in <year>?
    ("How many employee appeals were made in <year>?",
     "SELECT SUM(`Employee Appeals`) FROM example_table WHERE `Year-Month` LIKE '<year>-%'"),

    # 4. How many employer appeals were made in <year>?
    ("How many employer appeals were made in <year>?",
     "SELECT SUM(`Employer Appeals`) FROM example_table WHERE `Year-Month` LIKE '<year>-%'"),

    # 5. How many "Notice of Contravention" appeals were there in <year>?
    ("How many 'Notice of Contravention' appeals were there in <year>?",
     "SELECT SUM(`Notice of Contravention Appeals`) FROM example_table WHERE `Year-Month` LIKE '<year>-%'"),

    # 6. How many other appeals were there in <year>?
    ("How many other appeals were there in <year>?",
     "SELECT SUM(`Other Appeals`) FROM example_table WHERE `Year-Month` LIKE '<year>-%'"),

    # 7. In which quarter of <year> were the most appeals received?
    ("In which quarter of <year> were the most appeals received?",
     "SELECT `Quarter (Q)`, SUM(`Total Appeals Received`) as total FROM example_table WHERE `Year-Month` LIKE '<year>-%' GROUP BY `Quarter (Q)` ORDER BY total DESC LIMIT 1"),

    # 8. In which month of <year> were the most director appeals made?
    ("In which month of <year> were the most director appeals made?",
     "SELECT `Year-Month`, `Director Appeals` FROM example_table WHERE `Year-Month` LIKE '<year>-%' ORDER BY `Director Appeals` DESC LIMIT 1"),

    # 9. How many total appeals were received in the month of <year-month>?
    ("How many total appeals were received in the month of <year-month>?",
     "SELECT `Total Appeals Received` FROM example_table WHERE `Year-Month` = '<year-month>'"),

    # 10. How many appeals in total have been received until now?
    ("How many appeals in total have been received until now?",
     "SELECT SUM(`Total Appeals Received`) FROM example_table"),

    # 11. How many director appeals in total have been received until now?
    ("How many director appeals in total have been received until now?",
     "SELECT SUM(`Director Appeals`) FROM example_table"),

    # 12. How many employee appeals in total have been received until now?
    ("How many employee appeals in total have been received until now?",
     "SELECT SUM(`Employee Appeals`) FROM example_table"),

    # 13. How many employer appeals in total have been received until now?
    ("How many employer appeals in total have been received until now?",
     "SELECT SUM(`Employer Appeals`) FROM example_table"),

    # 14. How many "Notice of Contravention" appeals in total have been received until now?
    ("How many 'Notice of Contravention' appeals in total have been received until now?",
     "SELECT SUM(`Notice of Contravention Appeals`) FROM example_table"),

    # 15. How many other appeals in total have been received until now?
    ("How many other appeals in total have been received until now?",
     "SELECT SUM(`Other Appeals`) FROM example_table"),

    # 16. In which year were the most appeals received?
    ("In which year were the most appeals received?",
     "SELECT SUBSTR(`Year-Month`, 1, 4) as Year, SUM(`Total Appeals Received`) as total FROM example_table GROUP BY Year ORDER BY total DESC LIMIT 1")
]

Step 2. Call the openai api with the list of questions, and verify if the claim could be verified by using any of them or any combination of them.

noamoss commented 1 year ago

self note:

get claim
explore for possible meaning varations
for each version, define required/supportive data points
for each data point + csv, query for the daya point

noamoss commented 1 year ago

Relevant example for light implementation https://www.pragnakalp.com/data-extraction-from-tabular-data-with-chatgpt/

whiletrue-industries / odds

calim + csv --> scheme --> query --> result #38

Example claim: There was a decline in employee appeals in the year 2014 compared to the previous years.

Method A - generate ad-hoc query

example output

Method B - predefine potential questions per table in advance