simonw / datasette

An open source multi-tool for exploring and publishing data
https://datasette.io
Apache License 2.0
9.57k stars 691 forks source link

`/db/-/create` API for creating tables #1882

Closed simonw closed 2 years ago

simonw commented 2 years ago

It really feels like this should be accompanied by a /db/-/create API for creating tables. I had to add that to sqlite-utils eventually (initially it only supported creating by passing in an example document):

https://sqlite-utils.datasette.io/en/stable/cli.html#cli-create-table

Originally posted by @simonw in https://github.com/simonw/datasette/issues/1862#issuecomment-1299073433

simonw commented 2 years ago

API design for this:

POST /db/-/create
Authorization: Bearer xxx
Content-Type: application/json
{
    "table": {
        "name": "my new table",
        "columns": [
            {
                "name": "id",
                "type": "integer"
            },
            {
                "name": "title",
                "type": "text"
            }
        ]
       "pk": "id"
    }
}

Supported column types are:

This matches my design for sqlite-utils: https://sqlite-utils.datasette.io/en/stable/cli.html#cli-create-table

simonw commented 2 years ago

Validating this JSON object is getting a tiny bit complex. I'm tempted to adopt https://pydantic-docs.helpmanual.io/ at this point.

The create_model example on https://stackoverflow.com/questions/66168517/generate-dynamic-model-using-pydantic/66168682#66168682 is particularly relevant, especially when I work on this issue:

from pydantic import create_model

d = {"strategy": {"name": "test_strat2", "periods": 10}}

Strategy = create_model("Strategy", **d["strategy"])

print(Strategy.schema_json(indent=2))

create_model(): https://pydantic-docs.helpmanual.io/usage/models/#dynamic-model-creation

simonw commented 2 years ago

Mocked up a quick HTML+JavaScript form for creating that JSON structure using some iteration against Copilot prompts:

<pre>
/* JSON format:
{
  "table": {
      "name": "my new table",
      "columns": [
          {
              "name": "id",
              "type": "integer"
          },
          {
              "name": "title",
              "type": "text"
          }
      ]
     "pk": "id"
  }
}

HTML form with Javascript for creating this JSON:
*/</pre>
<form id="create-table-form">
  <label for="table-name">Table name</label>
  <input type="text" id="table-name" name="table-name" required><br>
  <label for="table-pk">Primary key</label>
  <input type="text" id="table-pk" name="table-pk" required><br>
  <label for="column-name">Column name</label>
  <input type="text" id="column-name" name="column-name" required>
  <label for="column-type">Column type</label>
  <input type="text" id="column-type" name="column-type" required>
  <button type="button" id="add-column">Add column</button>
  <p>Current columns:</p>
  <ul id="columns"></ul>
  <button type="button" id="create-table">Create table</button>
</form>
<script>
var form = document.getElementById('create-table-form');
var tableName = document.getElementById('table-name');
var tablePk = document.getElementById('table-pk');
var columnName = document.getElementById('column-name');
var columnType = document.getElementById('column-type');
var addColumn = document.getElementById('add-column');
var createTable = document.getElementById('create-table');
var columns = [];
addColumn.addEventListener('click', () => {
  columns.push({
    name: columnName.value,
    type: columnType.value
  });
  var li = document.createElement('li');
  li.textContent = columnName.value + ' (' + columnType.value + ')';
  // Add a delete button to each column
  var deleteButton = document.createElement('button');
  deleteButton.textContent = 'Delete';
  deleteButton.addEventListener('click', () => {
    columns.splice(columns.indexOf(li), 1);
    li.remove();
  });
  li.appendChild(deleteButton);
  document.getElementById('columns').appendChild(li);
});
createTable.addEventListener('click', () => {
  var table = {
    name: tableName.value,
    pk: tablePk.value,
    columns: columns
  };
  console.log(JSON.stringify(table, null, 2));
});
</script>
simonw commented 2 years ago

I made a decision here that this endpoint should also accept an optional "rows": [...] list which is used to automatically create the table using a schema derived from those example rows (which then get inserted):

simonw commented 2 years ago

On that basis I think the core API design should change to this:

POST /db/-/create
Authorization: Bearer xxx
Content-Type: application/json
{
    "name": "my new table",
    "columns": [
        {
            "name": "id",
            "type": "integer"
        },
        {
            "name": "title",
            "type": "text"
        }
    ]
   "pk": "id"
}

This leaves room for a "rows": [] key at the root too. Having that as a child of "table" felt unintuitive to me, and I didn't like the way this looked either:

{
  "table": {
    "name": "my_new_table"
  },
  "rows": [
    {"id": 1, "title": "Title"}
  ]
}

Weird to have the table name nested inside table when rows wasn't.

simonw commented 2 years ago

Thought of an edge-case: with sqlite-utils one feature I really like is that I can pipe data into it without caring if the table already exists or not:

cat data.json | sqlite-utils insert my.db mytable -

How could this new API support that?

I thought about adding a "create": true option to /db/table/-/insert which creates the table if it doesn't already exist, but if I do that I'll need to start adding other options to that endpoint - to set the primary key, add foreign keys and suchlike - which would be ignored except for the cases where the table was being created from scratch.

This doesn't feel right to me - I want to keep those options here, on /db/-/create.

One idea I had was to implement it such that you can call /db/-/create multiple times for the same table, but only if you are using the "rows" option. If so, and if the rows can be safely inserted, it would let you do that.

But instead, I'm going to outsource this to the CLI tool I plan to write that feeds data into this API. I'm already planning to use that tool for CSV inserts (so the API doesn't need to accept CSV directly). I think it's a good place for other usability enhancements like "insert this, creating the table if it does not exist" as well.

simonw commented 2 years ago

Need to validate the table name. SQLite supports almost any table name - but they can't contain a newline character and cannot start with sqlite_ - according to https://stackoverflow.com/questions/3694276/what-are-valid-table-names-in-sqlite/43049720#43049720

simonw commented 2 years ago

What should this API return?

I think the name of the table (name), the URL to that table (table_url - for consistency with how faceting API works already) and the schema of the table (schema).

simonw commented 2 years ago

Tried out my prototype in the API explorer:

image

The "name" on the output is bothering me a bit - should it be table_name or table instead?

Problem is I really like name for the input, so should it be consistent to have the same name on the output here, or should I aim for consistency with other endpoints that might return table rather than the ambiguous name elsewhere?

simonw commented 2 years ago

I'm going to change it to table in the output AND the input.

simonw commented 2 years ago

I like this:

image
{
  "ok": true,
  "database": "data",
  "table": "agai2n",
  "table_url": "http://127.0.0.1:8001/data/agai2n",
  "schema": "CREATE TABLE [agai2n] (\n   [hello] INTEGER\n)"
}
simonw commented 2 years ago

Documentation:

Wrote a TIL about how I wrote some of those tests with Copilot: https://til.simonwillison.net/gpt3/writing-test-with-copilot