Closed simonw closed 2 years ago
API design for this:
POST /db/-/create
Authorization: Bearer xxx
Content-Type: application/json
{
"table": {
"name": "my new table",
"columns": [
{
"name": "id",
"type": "integer"
},
{
"name": "title",
"type": "text"
}
]
"pk": "id"
}
}
Supported column types are:
integer
text
float
(even though SQLite calls it a "real")blob
This matches my design for sqlite-utils
: https://sqlite-utils.datasette.io/en/stable/cli.html#cli-create-table
Validating this JSON object is getting a tiny bit complex. I'm tempted to adopt https://pydantic-docs.helpmanual.io/ at this point.
The create_model
example on https://stackoverflow.com/questions/66168517/generate-dynamic-model-using-pydantic/66168682#66168682 is particularly relevant, especially when I work on this issue:
from pydantic import create_model
d = {"strategy": {"name": "test_strat2", "periods": 10}}
Strategy = create_model("Strategy", **d["strategy"])
print(Strategy.schema_json(indent=2))
create_model()
: https://pydantic-docs.helpmanual.io/usage/models/#dynamic-model-creation
Mocked up a quick HTML+JavaScript form for creating that JSON structure using some iteration against Copilot prompts:
<pre>
/* JSON format:
{
"table": {
"name": "my new table",
"columns": [
{
"name": "id",
"type": "integer"
},
{
"name": "title",
"type": "text"
}
]
"pk": "id"
}
}
HTML form with Javascript for creating this JSON:
*/</pre>
<form id="create-table-form">
<label for="table-name">Table name</label>
<input type="text" id="table-name" name="table-name" required><br>
<label for="table-pk">Primary key</label>
<input type="text" id="table-pk" name="table-pk" required><br>
<label for="column-name">Column name</label>
<input type="text" id="column-name" name="column-name" required>
<label for="column-type">Column type</label>
<input type="text" id="column-type" name="column-type" required>
<button type="button" id="add-column">Add column</button>
<p>Current columns:</p>
<ul id="columns"></ul>
<button type="button" id="create-table">Create table</button>
</form>
<script>
var form = document.getElementById('create-table-form');
var tableName = document.getElementById('table-name');
var tablePk = document.getElementById('table-pk');
var columnName = document.getElementById('column-name');
var columnType = document.getElementById('column-type');
var addColumn = document.getElementById('add-column');
var createTable = document.getElementById('create-table');
var columns = [];
addColumn.addEventListener('click', () => {
columns.push({
name: columnName.value,
type: columnType.value
});
var li = document.createElement('li');
li.textContent = columnName.value + ' (' + columnType.value + ')';
// Add a delete button to each column
var deleteButton = document.createElement('button');
deleteButton.textContent = 'Delete';
deleteButton.addEventListener('click', () => {
columns.splice(columns.indexOf(li), 1);
li.remove();
});
li.appendChild(deleteButton);
document.getElementById('columns').appendChild(li);
});
createTable.addEventListener('click', () => {
var table = {
name: tableName.value,
pk: tablePk.value,
columns: columns
};
console.log(JSON.stringify(table, null, 2));
});
</script>
I made a decision here that this endpoint should also accept an optional "rows": [...]
list which is used to automatically create the table using a schema derived from those example rows (which then get inserted):
On that basis I think the core API design should change to this:
POST /db/-/create
Authorization: Bearer xxx
Content-Type: application/json
{
"name": "my new table",
"columns": [
{
"name": "id",
"type": "integer"
},
{
"name": "title",
"type": "text"
}
]
"pk": "id"
}
This leaves room for a "rows": []
key at the root too. Having that as a child of "table"
felt unintuitive to me, and I didn't like the way this looked either:
{
"table": {
"name": "my_new_table"
},
"rows": [
{"id": 1, "title": "Title"}
]
}
Weird to have the table name
nested inside table
when rows
wasn't.
Thought of an edge-case: with sqlite-utils
one feature I really like is that I can pipe data into it without caring if the table already exists or not:
cat data.json | sqlite-utils insert my.db mytable -
How could this new API support that?
I thought about adding a "create": true
option to /db/table/-/insert
which creates the table if it doesn't already exist, but if I do that I'll need to start adding other options to that endpoint - to set the primary key, add foreign keys and suchlike - which would be ignored except for the cases where the table was being created from scratch.
This doesn't feel right to me - I want to keep those options here, on /db/-/create
.
One idea I had was to implement it such that you can call /db/-/create
multiple times for the same table, but only if you are using the "rows"
option. If so, and if the rows can be safely inserted, it would let you do that.
But instead, I'm going to outsource this to the CLI tool I plan to write that feeds data into this API. I'm already planning to use that tool for CSV inserts (so the API doesn't need to accept CSV directly). I think it's a good place for other usability enhancements like "insert this, creating the table if it does not exist" as well.
Need to validate the table name. SQLite supports almost any table name - but they can't contain a newline character and cannot start with sqlite_
- according to https://stackoverflow.com/questions/3694276/what-are-valid-table-names-in-sqlite/43049720#43049720
What should this API return?
I think the name of the table (name
), the URL to that table (table_url
- for consistency with how faceting API works already) and the schema of the table (schema
).
Tried out my prototype in the API explorer:
The "name"
on the output is bothering me a bit - should it be table_name
or table
instead?
Problem is I really like name
for the input, so should it be consistent to have the same name on the output here, or should I aim for consistency with other endpoints that might return table
rather than the ambiguous name
elsewhere?
I'm going to change it to table
in the output AND the input.
I like this:
{
"ok": true,
"database": "data",
"table": "agai2n",
"table_url": "http://127.0.0.1:8001/data/agai2n",
"schema": "CREATE TABLE [agai2n] (\n [hello] INTEGER\n)"
}
Documentation:
Wrote a TIL about how I wrote some of those tests with Copilot: https://til.simonwillison.net/gpt3/writing-test-with-copilot
Originally posted by @simonw in https://github.com/simonw/datasette/issues/1862#issuecomment-1299073433