Open eyeseast opened 2 years ago
I like add-generated-column
- feels very clear to me, and is a nice place for adding logic that checks if the DB version supports it or not and shows a useful error.
The Python API could be:
db[table_name].add_generated_column("field", str, "json_extract(data, '$.field')", stored=True)
Just spotted this in https://www.sqlite.org/gencol.html
The only functional difference is that one cannot add new STORED columns using the ALTER TABLE ADD COLUMN command. Only VIRTUAL columns can be added using ALTER TABLE.
So to add stored columns to an existing table we would need to use the .transform()
trick. Which implies that this should actually be a capability of the various .create()
methods, since transform works by creating a new table with those and then copying across the old data.
Here's where .transform()
calls .create_table_sql()
under the hood:
Two new parameters to .create_table()
and friends:
generated={...}
- generated column definitionsgenerated_stored={...}
generated stored column definitionsThese columns will be added at the end of the table, but you can use the column_order=
parameter to apply a different order.
I'm coming round to your suggestion to have this as extra arguments to sqlite-utils add-column
now, especially since you also need to pass a column type.
I'd like to come up with syntax for sqlite-utils create-table
as well.
https://sqlite-utils.datasette.io/en/stable/cli-reference.html#create-table
Maybe extra --generated-stored colname expression
(and --generated
) options would work there.
Good call on adding this to create-table
, especially for stored columns. Having the stored/virtual split might make this tricky to implement, but I haven't gone any farther than thinking about what the CLI looks like. I'm going to try making the SQL side work first and figure that'll tell me more about what it needs.
I tried it out in Jupyter and it works as advertised:
Introspection is a bit weird: there doesn't seem to be a way to introspect generated columns outside of parsing the stored SQL schema for the columns at the moment! And the .columns
method doesn't return them at all:
Here's why:
>>> db.execute("PRAGMA table_info('t')").fetchall()
[(0, 'body', 'TEXT', 0, None, 0)]
>>> db.execute("PRAGMA table_xinfo('t')").fetchall()
[(0, 'body', 'TEXT', 0, None, 0, 0), (1, 'd', 'INT', 0, None, 0, 2)]
So table_xinfo()
is needed to get back columns including generated columns: https://www.sqlite.org/pragma.html#pragma_table_xinfo
PRAGMA *schema.table_xinfo(table-name*);
This pragma returns one row for each column in the named table, including hidden columns in virtual tables. The output is the same as for PRAGMA table_info except that hidden columns are shown rather than being omitted.
Trying to figure out what that extra field in table_info
compared to table_xinfo
is:
>>> list(db.query("PRAGMA table_xinfo('t')"))
[{'cid': 0,
'name': 'body',
'type': 'TEXT',
'notnull': 0,
'dflt_value': None,
'pk': 0,
'hidden': 0},
{'cid': 1,
'name': 'd',
'type': 'INT',
'notnull': 0,
'dflt_value': None,
'pk': 0,
'hidden': 2}]
``
Presumably `hidden` 0 v.s 2 v.s. other values has meaning.
This is a fairly new feature -- SQLite version 3.31.0 (2020-01-22) -- that I, admittedly, haven't gotten to work yet. But it looks incredibly useful: https://dgl.cx/2020/06/sqlite-json-support
I'm not sure if this is an option on
add-column
or a separate command likeadd-generated-column
. Either way, it needs an argument to populate it. It could be something like this:More here: https://www.sqlite.org/gencol.html