simonw / datasette-chatgpt-plugin

A Datasette plugin that turns a Datasette instance into a ChatGPT plugin
https://datasette.io/plugins/datasette-chatgpt-plugin
Apache License 2.0
67 stars 0 forks source link

Initial design #1

Closed simonw closed 1 year ago

simonw commented 1 year ago

Goal is to have a plugin that turns any Datasette instance into a ChatGPT plugin - providing instructions for ChatGPT to execute SQL queries against it.

simonw commented 1 year ago

Documentation: https://platform.openai.com/docs/plugins/introduction

Mainly needs to provide a /.well-known/ai-plugin.json that points to a OpenAPI specification.

simonw commented 1 year ago

I got ChatGPT to build me this OpenAPI specification:

openapi: 3.0.0
info:
  title: Datasette API
  version: 1.0.0
paths:
  /fixtures.json:
    get:
      summary: Retrieve data from the facetable
      description: This endpoint allows users to execute SQL queries on the facetable and retrieve the results as an array of JSON objects.
      parameters:
        - name: sql
          in: query
          description: The SQL query to be executed on the facetable.
          required: true
          schema:
            type: string
        - name: _shape
          in: query
          description: The shape of the response data. Set to "array" to receive an array of JSON objects.
          required: true
          schema:
            type: string
            enum:
              - array
      responses:
        '200':
          description: Successful operation
          content:
            application/json:
              schema:
                type: array
                items:
                  type: object
        '400':
          description: Bad request
        '500':
          description: Internal server error

From this prompt:

Write an OpenAPI schema explaining the https://latest.datasette.io/fixtures.json?sql=select+*+from+facetable&_shape=array GET API which accepts SQL and returns an array of JSON objects

simonw commented 1 year ago

Also this example:

simonw commented 1 year ago

Initial prototype (in plugins/chatgpt.py on Glitch):

from datasette import hookimpl, Response

SCHEMA = """
openapi: 3.0.0
info:
  title: Datasette API
  version: 1.0.0
paths:
  /fixtures.json:
    get:
      summary: Execute SQL against the fixtures database
      description: This endpoint allows users to execute SQLite SQL queries and retrieve the results as an array of JSON objects.
      parameters:
        - name: sql
          in: query
          description: The SQL query to be executed
          required: true
          schema:
            type: string
        - name: _shape
          in: query
          description: The shape of the response data. Must be "array"
          required: true
          schema:
            type: string
            enum:
              - array
      responses:
        '200':
          description: Successful SQL results
          content:
            application/json:
              schema:
                type: array
                items:
                  type: object
        '400':
          description: Bad request
        '500':
          description: Internal server error
"""

AI_PLUGIN = {
    "schema_version": "v1",
    "name_for_model": "sql-fixtures",
    "name_for_human": "SQL against fixtures",
    "description_for_model": "Run SQL queries against the fixtures database.",
    "description_for_human": "Run SQL against Datasette.",
    "api": {
      "type": "openapi",
      "url": "https://cedar-antique-cosmos.glitch.me/openapi-schema.yml",
      "has_user_authentication": False
    },
    "contact_email": "hello@contact.com", 
    "legal_info_url": "hello@legal.com"
}

async def ai_plugin(request):
    return Response.json(AI_PLUGIN)

async def openapi_schema(request):
    return Response.text(SCHEMA)

@hookimpl
def register_routes():
    return [
        (r"^/\.well-known/ai-plugin\.json$", ai_plugin),
        (r"^/openapi-schema.yml$", openapi_schema),
    ]
image
simonw commented 1 year ago

Fixed it:

image
AI_PLUGIN = {
    "schema_version": "v1",
    "name_for_model": "sql_fixtures",
    "name_for_human": "SQL against fixtures",
    "description_for_model": "Run SQL queries against the fixtures database.",
    "description_for_human": "Run SQL against Datasette.",
    "auth": {
      "type": "user_http",
      "authorization_type": "bearer"
    },
    "api": {
      "type": "openapi",
      "url": "https://cedar-antique-cosmos.glitch.me/openapi-schema.yml",
      "has_user_authentication": False
    },
    "logo_url": "https://avatars.githubusercontent.com/u/126964132?s=400&u=08b2ed680144a4feb421308f09e5f3cc5876211a&v=4",
    "contact_email": "hello@contact.com", 
    "legal_info_url": "hello@legal.com"
}
simonw commented 1 year ago
CleanShot 2023-03-23 at 22 10 15@2x image

I'd like to tell it it doesn't need an auth token.

simonw commented 1 year ago
image
simonw commented 1 year ago

I can't figure out why I'm getting that UnrecognizedFunctionError. The server logs don't show any hits to my API.

simonw commented 1 year ago

Found more documentation: https://platform.openai.com/docs/plugins/getting-started/plugin-manifest - visible on desktop but not I think on mobile.

simonw commented 1 year ago

I'm trying to include the schema, but I think I broke it:

from datasette import hookimpl, Response
import textwrap

PROMPT = """
Run SQLite queries against the content SQL database.
Use 'select * from sqlite_master' to see the list of tables.
Use 'select * from pragma_table_info("name_of_table")' to see the columns in a table
NEVER use 'PRAGMA table_info' or any other PRAGMA - always use 'pragma_table_info()' and suchlike instead
"""

def make_openapi_schema(prompt):
    return """
openapi: 3.0.1
info:
  title: Datasette fixtures plugin
  description: A plugin that allows ChatGPT to run SQL queries against a Datasette SQLite database
  version: 'v1'
servers:
  - url: https://cedar-antique-cosmos.glitch.me
paths:
  /content.json:
    get:
      operationId: query
      summary: Execute a SQLite SQL query against the content database
      description: |-
{}
      parameters:
      - name: sql
        in: query
        description: The SQL query to be executed
        required: true
        schema:
          type: string
      - name: _shape
        in: query
        description: The shape of the response data. Must be "array"
        required: true
        schema:
          type: string
          enum:
            - array
      responses:
        '200':
          description: Successful SQL results
          content:
            application/json:
              schema:
                type: array
                items:
                  type: object
        '400':
          description: Bad request
        '500':
          description: Internal server error
""".format(textwrap.indent(prompt, '        '))

def ai_plugin_schema(extra_prompt):
    return {
        "schema_version": "v1",
        "name_for_model": "datasette_content",
        "name_for_human": "SQL against content",
        "description_for_model": PROMPT,
        "description_for_human": "Run SQL against content database in Datasette.",
        "auth": {
            "type": "none"
        },
        "api": {
          "type": "openapi",
          "url": "https://cedar-antique-cosmos.glitch.me/openapi-schema.yml",
          "has_user_authentication": False
        },
        "logo_url": "https://avatars.githubusercontent.com/u/126964132?s=400&u=08b2ed680144a4feb421308f09e5f3cc5876211a&v=4",
        "contact_email": "hello@contact.com", 
        "legal_info_url": "hello@legal.com"
    }

async def ai_plugin():
    return Response.json(ai_plugin_schema(""))

async def build_schema(datasette):
    database = [value for key, value in datasette.databases.items() if not key.startswith("_")][0]
    return (await database.execute("select group_concat(sql, ';') from sqlite_master")).single_value()

async def openapi_schema(datasette):
    return Response.text(make_openapi_schema(await build_schema(datasette)))

@hookimpl
def register_routes():
    return [
        (r"^/\.well-known/ai-plugin\.json$", ai_plugin),
        (r"^/openapi-schema.yml$", openapi_schema),
    ]
simonw commented 1 year ago

Tried uninstalling and reinstalling the plugin and got this:

image
simonw commented 1 year ago

Oh dear it hallucinated!

image

Despite that query returning the correct results:

image
simonw commented 1 year ago

My current working plugin is:

from datasette import hookimpl, Response
import textwrap

PROMPT = """
Run SQLite queries against the content SQL database.
Use 'select * from sqlite_master' to see the list of tables.
Use 'select * from pragma_table_info("name_of_table")' to see the columns in a table
NEVER use 'PRAGMA table_info' or any other PRAGMA - always use 'pragma_table_info()' and suchlike instead
"""

def make_openapi_schema(prompt):
    return """
openapi: 3.0.1
info:
  title: Datasette fixtures plugin
  description: A plugin that allows ChatGPT to run SQL queries against a Datasette SQLite database
  version: 'v1'
servers:
  - url: https://cedar-antique-cosmos.glitch.me
paths:
  /content.json:
    get:
      operationId: query
      summary: Execute a SQLite SQL query against the content database
      description: |-
{}
      parameters:
      - name: sql
        in: query
        description: The SQL query to be executed
        required: true
        schema:
          type: string
      - name: _shape
        in: query
        description: The shape of the response data. Must be "array"
        required: true
        schema:
          type: string
          enum:
            - array
      responses:
        '200':
          description: Successful SQL results
          content:
            application/json:
              schema:
                type: array
                items:
                  type: object
        '400':
          description: Bad request
        '500':
          description: Internal server error
""".format(textwrap.indent(prompt, '        '))

def ai_plugin_schema(extra_prompt):
    return {
        "schema_version": "v1",
        "name_for_model": "datasette_content",
        "name_for_human": "SQL against content",
        "description_for_model": PROMPT + ' ' + extra_prompt,
        "description_for_human": "Run SQL against content database in Datasette.",
        "auth": {
            "type": "none"
        },
        "api": {
          "type": "openapi",
          "url": "https://cedar-antique-cosmos.glitch.me/openapi-schema.yml",
          "has_user_authentication": False
        },
        "logo_url": "https://avatars.githubusercontent.com/u/126964132?s=400&u=08b2ed680144a4feb421308f09e5f3cc5876211a&v=4",
        "contact_email": "hello@contact.com", 
        "legal_info_url": "hello@legal.com"
    }

async def ai_plugin(datasette):
    return Response.json(ai_plugin_schema(await available_tables(datasette)))

async def build_schema(datasette):
    database = [value for key, value in datasette.databases.items() if not key.startswith("_")][0]
    return (await database.execute("select group_concat(sql, ';') from sqlite_master")).single_value()

async def available_tables(datasette):
    database = [value for key, value in datasette.databases.items() if not key.startswith("_")][0]
    return (await database.execute("select group_concat(name, ', ') from sqlite_master where type = 'table'")).single_value()

async def openapi_schema():
    return Response.text(make_openapi_schema("Use SQLite syntax"))

@hookimpl
def register_routes():
    return [
        (r"^/\.well-known/ai-plugin\.json$", ai_plugin),
        (r"^/openapi-schema.yml$", openapi_schema),
    ]

The schema proved too large so I'm sending a list of tables instead.

simonw commented 1 year ago

OK, this prototype is working well enough that I'm going to release it as an alpha.

Schema discovery still isn't great - it keeps trying to run those PRAGMA commands. Maybe I should give it an explicit API for that, or perhaps enable PRAGMA table_info since it's so keen to use that.

simonw commented 1 year ago

https://platform.openai.com/docs/plugins/getting-started/plugin-manifest limits:

  • 50 character max for name_for_human
  • 50 character max for name_for_model
  • 120 character max for description_for_human
  • 8000 character max just for description_for_model (will decrease over time)

It's not clear to me how important name_for_model is. Since this Datasette plugin could be installed on many instances I don't think it should be the same for each one, and it looks like hyphens are not allowed but underscores are.

So maybe datasette_latest_datasette_io would be a good name, using the server hostname with underscores?

What if that's longer than 50 characters?

>>> len("datasette_global_power_plants_datasettes_com")
44
>>> len("datasette_congress_legislators_datasettes.com")
45

So I'm already getting close with some of my longer named examples.

I'm tempted to go with datasette_as_much_as_possible_hexhash - ending in a 6 character hash of the whole value. That should give me good uniqueness without busting the length limit.

simonw commented 1 year ago

Removing this code since I'm not calling it:


async def build_schema(datasette):
    database = [
        value for key, value in datasette.databases.items() if not key.startswith("_")
    ][0]
    return (
        await database.execute("select group_concat(sql, ';') from sqlite_master")
    ).single_value()

async def available_tables(datasette):
    database = [
        value for key, value in datasette.databases.items() if not key.startswith("_")
    ][0]
    return (
        await database.execute(
            "select group_concat(name, ', ') from sqlite_master where type = 'table'"
        )
    ).single_value()
simonw commented 1 year ago

I don't see OpenAI checking for updates to my ai-plugin.json file - so I'm going to try a fresh deploy with a new URL, since I also can't see how to uninstall and reinstall a plugin to trigger that refresh.

Tried deploying a new Glitch instance and got this:

image

But I caught it doing a fetch() to an API that failed to return my new plugin:

image

Figured that out:

image

You need to hit "Develop your own plugin" and not "Install an unverified plugin".

simonw commented 1 year ago
CleanShot 2023-03-24 at 07 09 22@2x
simonw commented 1 year ago

OK! The latest code deployed fresh does indeed seem to work - and doesn't try to run the PRAGMA table_info things any more so it seems to be obeying the latest prompt.

simonw commented 1 year ago

Blogged it here: https://simonwillison.net/2023/Mar/24/datasette-chatgpt-plugin/