simonw / sqlite-utils-ask

Ask questions of your data with LLM assistance
Apache License 2.0
22 stars 0 forks source link

Option that passes the results back to the LLM so it can generate a prose answer for you - `-p/--prose` #7

Open simonw opened 3 months ago

simonw commented 3 months ago

Prototype:

diff --git a/sqlite_utils_ask.py b/sqlite_utils_ask.py
index 6e18907..92f2cb4 100644
--- a/sqlite_utils_ask.py
+++ b/sqlite_utils_ask.py
@@ -18,7 +18,8 @@ def register_commands(cli):
     @click.argument("question")
     @click.option("model_id", "-m", "--model", help="LLM model to use")
     @click.option("-v", "--verbose", is_flag=True, help="Verbose output")
-    def ask(path, question, model_id, verbose):
+    @click.option("-p", "--prose", is_flag=True, help="Provide an answer in prose")
+    def ask(path, question, model_id, verbose, prose):
         "Ask a question of your data"
         # Open in read-only mode
         conn = sqlite3.connect("file:{}?mode=ro".format(str(path)), uri=True)
@@ -94,6 +95,15 @@ def register_commands(cli):
             if verbose:
                 click.echo(conversation.responses, err=True)

+        if prose:
+            prompt = sql + "\n\n"
+            prompt = "Given these results from that SQL query:\n\n"
+            prompt += json.dumps(results, indent=4, default=repr)
+            prompt += "\n\nAnswer the user's question: " + question
+            for token in model.prompt(prompt):
+                print(token, end="")
+            print()
+

 _pattern = r"```sql\n(.*?)\n```"

Demo:

sqlite-utils ask content.db 'what are the most interesting repos?' -p

Output:

{
    "sql": "SELECT\n  repos.name AS name,\n  repos.full_name AS full_name,\n  users.login AS owner,\n  repos.description AS description,\n  repos.stargazers_count,\n  repos.created_at AS created_at,\n  (\n    SELECT\n      SUM(downloads)\n    FROM\n      stats\n    WHERE\n      stats.package = repos.name\n      AND stats.date > DATE('now', '-7 days')\n  ) AS downloads_this_week,\n  (\n    SELECT\n      COUNT(*)\n    FROM\n      plugin_repos\n    WHERE\n      repo = repos.full_name\n  ) AS is_plugin,\n  (\n    SELECT\n      COUNT(*)\n    FROM\n      tool_repos\n    WHERE\n      repo = repos.full_name\n  ) AS is_tool\nFROM\n  repos\nJOIN\n  users ON users.id = repos.owner\nWHERE\n  repos.stargazers_count > (\n    SELECT AVG(stargazers_count) FROM repos\n  )\nORDER BY\n  repos.stargazers_count DESC;",
    "results": [
        {
            "name": "datasette",
            "full_name": "simonw/datasette",
            "owner": "simonw",
            "description": "An open source multi-tool for exploring and publishing data",
            "stargazers_count": 9182,
            "created_at": "2017-10-23T00:39:03Z",
            "downloads_this_week": 4643,
            "is_plugin": 0,
            "is_tool": 0
        },
        {
            "name": "llm",
            "full_name": "simonw/llm",
            "owner": "simonw",
            "description": "Access large language models from the command-line",
            "stargazers_count": 3512,
            "created_at": "2023-04-01T21:16:57Z",
            "downloads_this_week": 1125,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "sqlite-utils",
            "full_name": "simonw/sqlite-utils",
            "owner": "simonw",
            "description": "Python CLI utility and library for manipulating SQLite databases",
            "stargazers_count": 1575,
            "created_at": "2018-07-14T03:21:46Z",
            "downloads_this_week": 21452,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "sqlite-vss",
            "full_name": "asg017/sqlite-vss",
            "owner": "asg017",
            "description": "A SQLite extension for efficient vector search, based on Faiss!",
            "stargazers_count": 1494,
            "created_at": "2023-01-03T23:27:31Z",
            "downloads_this_week": 1203,
            "is_plugin": 1,
            "is_tool": 0
        },
        {
            "name": "shot-scraper",
            "full_name": "simonw/shot-scraper",
            "owner": "simonw",
            "description": "A command-line utility for taking automated screenshots of websites",
            "stargazers_count": 1490,
            "created_at": "2022-03-08T21:21:02Z",
            "downloads_this_week": 1826,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "csvs-to-sqlite",
            "full_name": "simonw/csvs-to-sqlite",
            "owner": "simonw",
            "description": "Convert CSV files into a SQLite database",
            "stargazers_count": 871,
            "created_at": "2017-11-13T06:38:21Z",
            "downloads_this_week": 77,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "django-sql-dashboard",
            "full_name": "simonw/django-sql-dashboard",
            "owner": "simonw",
            "description": "Django app for building dashboards using raw SQL queries",
            "stargazers_count": 420,
            "created_at": "2021-03-13T03:38:23Z",
            "downloads_this_week": 412,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "sqlite-lines",
            "full_name": "asg017/sqlite-lines",
            "owner": "asg017",
            "description": "A SQLite extension for reading large files line-by-line (NDJSON, logs, txt, etc.)",
            "stargazers_count": 374,
            "created_at": "2022-04-25T19:11:46Z",
            "downloads_this_week": 38,
            "is_plugin": 1,
            "is_tool": 0
        },
        {
            "name": "github-to-sqlite",
            "full_name": "dogsheep/github-to-sqlite",
            "owner": "dogsheep",
            "description": "Save data from GitHub to a SQLite database",
            "stargazers_count": 361,
            "created_at": "2019-09-08T02:50:28Z",
            "downloads_this_week": 62,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "sqlite-html",
            "full_name": "asg017/sqlite-html",
            "owner": "asg017",
            "description": "A SQLite extension for querying, manipulating, and creating HTML elements.",
            "stargazers_count": 344,
            "created_at": "2021-10-11T21:15:19Z",
            "downloads_this_week": 16,
            "is_plugin": 1,
            "is_tool": 0
        },
        {
            "name": "db-to-sqlite",
            "full_name": "simonw/db-to-sqlite",
            "owner": "simonw",
            "description": "CLI tool for exporting tables or queries from any SQL database to a SQLite file",
            "stargazers_count": 279,
            "created_at": "2019-01-17T04:16:48Z",
            "downloads_this_week": 279,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "twitter-to-sqlite",
            "full_name": "dogsheep/twitter-to-sqlite",
            "owner": "dogsheep",
            "description": "Save data from Twitter to a SQLite database",
            "stargazers_count": 269,
            "created_at": "2019-09-03T19:30:08Z",
            "downloads_this_week": 17,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "ttok",
            "full_name": "simonw/ttok",
            "owner": "simonw",
            "description": "Count and truncate text based on tokens",
            "stargazers_count": 231,
            "created_at": "2023-05-18T18:22:59Z",
            "downloads_this_week": 13,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "geocode-sqlite",
            "full_name": "eyeseast/geocode-sqlite",
            "owner": "eyeseast",
            "description": "Geocode rows in a SQLite database table",
            "stargazers_count": 223,
            "created_at": "2020-09-06T21:05:39Z",
            "downloads_this_week": 42,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "openai-to-sqlite",
            "full_name": "simonw/openai-to-sqlite",
            "owner": "simonw",
            "description": "Save OpenAI API results to a SQLite database",
            "stargazers_count": 211,
            "created_at": "2023-01-03T01:14:22Z",
            "downloads_this_week": 19,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "csv-diff",
            "full_name": "simonw/csv-diff",
            "owner": "simonw",
            "description": "Python CLI tool and library for diffing CSV and JSON files",
            "stargazers_count": 198,
            "created_at": "2019-03-13T01:11:26Z",
            "downloads_this_week": 8225,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "s3-credentials",
            "full_name": "simonw/s3-credentials",
            "owner": "simonw",
            "description": "A tool for creating credentials for accessing S3 buckets",
            "stargazers_count": 183,
            "created_at": "2021-11-02T20:09:50Z",
            "downloads_this_week": 36,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "sqlite-http",
            "full_name": "asg017/sqlite-http",
            "owner": "asg017",
            "description": "A SQLite extension for making HTTP requests purely in SQL",
            "stargazers_count": 182,
            "created_at": "2021-10-11T21:27:49Z",
            "downloads_this_week": 23,
            "is_plugin": 1,
            "is_tool": 0
        },
        {
            "name": "git-history",
            "full_name": "simonw/git-history",
            "owner": "simonw",
            "description": "Tools for analyzing Git history using SQLite",
            "stargazers_count": 172,
            "created_at": "2021-11-11T20:07:06Z",
            "downloads_this_week": 147,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "google-drive-to-sqlite",
            "full_name": "simonw/google-drive-to-sqlite",
            "owner": "simonw",
            "description": "Create a SQLite database containing metadata from Google Drive",
            "stargazers_count": 133,
            "created_at": "2022-02-16T02:16:29Z",
            "downloads_this_week": 4,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "datasette-extract",
            "full_name": "datasette/datasette-extract",
            "owner": "datasette",
            "description": "Import unstructured data (text and images) into structured tables",
            "stargazers_count": 133,
            "created_at": "2023-08-15T19:57:04Z",
            "downloads_this_week": 41,
            "is_plugin": 1,
            "is_tool": 0
        },
        {
            "name": "sqlite-regex",
            "full_name": "asg017/sqlite-regex",
            "owner": "asg017",
            "description": "A fast regular expression SQLite extension, written in Rust",
            "stargazers_count": 132,
            "created_at": "2022-11-04T18:31:01Z",
            "downloads_this_week": 209,
            "is_plugin": 1,
            "is_tool": 0
        },
        {
            "name": "datasette-dashboards",
            "full_name": "rclement/datasette-dashboards",
            "owner": "rclement",
            "description": "Datasette plugin providing data dashboards from metadata",
            "stargazers_count": 130,
            "created_at": "2021-04-23T21:56:48Z",
            "downloads_this_week": 123,
            "is_plugin": 1,
            "is_tool": 0
        },
        {
            "name": "dogsheep-photos",
            "full_name": "dogsheep/dogsheep-photos",
            "owner": "dogsheep",
            "description": "Upload your photos to S3 and import metadata about them into a SQLite database",
            "stargazers_count": 124,
            "created_at": "2020-04-18T19:22:13Z",
            "downloads_this_week": 2,
            "is_plugin": 0,
            "is_tool": 1
        }
    ]
}

Followed by:

To determine the most interesting repositories from the provided data, we can consider a combination of factors such as:

  1. Stargazer Count: A higher number of stars indicates popularity and interest from the community.
  2. Downloads This Week: This metric reflects current usage and relevance.
  3. Recency: More recent creations might indicate active development or relevance to current trends.
  4. Description: The purpose and utility of the repository can also be a factor in determining its interest level.

Based on these factors, here are some of the most interesting repositories:

  1. datasette (simonw/datasette)

    • Stars: 9182
    • Downloads This Week: 4643
    • Description: An open source multi-tool for exploring and publishing data.
    • Created At: 2017-10-23
  2. sqlite-utils (simonw/sqlite-utils)

    • Stars: 1575
    • Downloads This Week: 21452
    • Description: Python CLI utility and library for manipulating SQLite databases.
    • Created At: 2018-07-14
  3. llm (simonw/llm)

    • Stars: 3512
    • Downloads This Week: 1125
    • Description: Access large language models from the command-line.
    • Created At: 2023-04-01
  4. csv-diff (simonw/csv-diff)

    • Stars: 198
    • Downloads This Week: 8225
    • Description: Python CLI tool and library for diffing CSV and JSON files.
    • Created At: 2019-03-13
  5. shot-scraper (simonw/shot-scraper)

    • Stars: 1490
    • Downloads This Week: 1826
    • Description: A command-line utility for taking automated screenshots of websites.
    • Created At: 2022-03-08
  6. datasette-extract (datasette/datasette-extract)

    • Stars: 133
    • Downloads This Week: 41
    • Description: Import unstructured data (text and images) into structured tables.
    • Created At: 2023-08-15
  7. geocode-sqlite (eyeseast/geocode-sqlite)

    • Stars: 223
    • Downloads This Week: 42
    • Description: Geocode rows in a SQLite database table.
    • Created At: 2020-09-06

These repositories stand out due to their functionality, usage trends, and community engagement, making them particularly interesting.

That SQL query in full:

SELECT
  repos.name AS name,
  repos.full_name AS full_name,
  users.login AS owner,
  repos.description AS description,
  repos.stargazers_count,
  repos.created_at AS created_at,
  (
    SELECT
      SUM(downloads)
    FROM
      stats
    WHERE
      stats.package = repos.name
      AND stats.date > DATE('now', '-7 days')
  ) AS downloads_this_week,
  (
    SELECT
      COUNT(*)
    FROM
      plugin_repos
    WHERE
      repo = repos.full_name
  ) AS is_plugin,
  (
    SELECT
      COUNT(*)
    FROM
      tool_repos
    WHERE
      repo = repos.full_name
  ) AS is_tool
FROM
  repos
JOIN
  users ON users.id = repos.owner
WHERE
  repos.stargazers_count > (
    SELECT AVG(stargazers_count) FROM repos
  )
ORDER BY
  repos.stargazers_count DESC;
simonw commented 3 months ago

Definitely need a sensible limit on how much data we fire back into the model there! That SQL query didn't have a limit clause, could have burned a lot of tokens.