Closed 4l1fe closed 1 year ago
SQLite doesn't have a native DATETIME
type. It stores dates internally as strings and then has functions to work with date-like strings. Yes it's weird.
SQLite doesn't have a native
DATETIME
type. It stores dates internally as strings and then has functions to work with date-like strings. Yes it's weird.
That's correct. But my issue is about the application level libraries that, i suppose, have better data understanding if see a specific type such as DATETIME
.
I'm writing data with dataset i've mentioned. The lib changes its behavior depending on a type. I saw different behavior with types DATETIME, FLOAT, TEXT
. Dataset, for their part, is built upon Sqlalchemy, you know what it is.
To be honest, i didn't dive into the details of why the behavior changes, but when i altered manually by other util a type of column to DATETIME
things got back to normal.
On the matter, can i achieve it with Sqlite Utils at the moment?
I think it's not currently possible: sqlite-utils requires that it be one of integer
, text
, float
, blob
(see code)
IMO, this is a bit of friction and it would be nice if it was more permissive. SQLite permits developers to use any data type when creating a table. For example, this is a perfectly cromulent sqlite session that creates a table with columns of type baz
and bar
:
sqlite> create table foo(column1 baz, column2 bar);
sqlite> .schema foo
CREATE TABLE foo(column1 baz, column2 bar);
sqlite> select * from pragma_table_info('foo');
cid name type notnull dflt_value pk
---------- ---------- ---------- ---------- ---------- ----------
0 column1 baz 0 0
1 column2 bar 0 0
The idea is that the application developer will know what meaning to ascribe to those types. For example, I'm working on a plugin to Datasette. Dates are tricky to handle. If you have some existing rows, you can look at the values in them to know how a user is serializing the dates -- as an ISO 8601 string? An RFC 3339 string? With millisecond precision? With timezone offset? But if you don't yet have any rows, you have to guess. If the column is of type TEXT
, you don't even know that it's meant to hold a date! In this case, my plugin will look to see if the column is of type DATE
or DATETIME
, and assume a certain representation when writing.
Perhaps there is an argument that sqlite-utils is trying to conform to SQLite's strict mode, and that is why it limits the choices. In strict mode, SQLite requires that the data type be one of INT
, INTEGER
, REAL
, TEXT
, BLOB
, ANY
. But that can't be the case -- sqlite-utils supports FLOAT
, which is not one of the valid types in strict mode, and it rejects INT
, REAL
and ANY
, which are valid.
That said, it looks like the check is only enforced at the CLI level. If you use the API directly, I think it'll work.
That said, it looks like the check is only enforced at the CLI level. If you use the API directly, I think it'll work.
It works, but a column becomes TEXT
In [1]: import sqlite_utils
In [2]: db = sqlite_utils.Database('events.sqlite')
In [3]: table = db['cards.chunk.get']
In [4]: table.columns_dict
Out[4]:
{'id': int,
'timestamp': float,
'data_chunk_number': int,
'user_id': str,
'meta_duplication_source_id': int,
'context_sort_attribute': str,
'context_sort_order': str}
In [5]: from datetime import datetime
In [7]: table.transform(types={'timestamp': datetime})
In [8]: table.columns_dict
Out[8]:
{'id': int,
'timestamp': str,
'data_chunk_number': int,
'user_id': str,
'meta_duplication_source_id': int,
'context_sort_attribute': str,
'context_sort_order': str}
❯ sqlite-utils schema events.sqlite cards.chunk.get
CREATE TABLE "cards.chunk.get" (
[id] INTEGER PRIMARY KEY NOT NULL,
[timestamp] TEXT,
...
Ah, it looks like that is controlled by this dict: https://github.com/simonw/sqlite-utils/blob/main/sqlite_utils/db.py#L178
I suspect you could overwrite the datetime entry to achieve what you want
I could, of course.
Doest it worth bringing such the improvement to the library?
I'd support that, but I'm not the author of this library.
One challenge is that would be a breaking change. Do you see a way to enable it without affecting existing users or bumping the major version number?
Do you see a way to enable it without affecting existing users or bumping the major version number?
I don't see a clean solution, only extending code with a side variable that tells us we want to apply advanced types instead of basic.
it could be a similiar command like tranform-v2 --type column DATETIME
or a cli option transform --adv-type column DATETIME
along with a dict that contains the advanced types. Then with knowledge that we run an advanced command we take that dictionary somehow, we can wrap the current and new dictionaries by a superdict and work with it everywhere according to the knowledge. This way shouldn't affect users who are using the previous lib versions and it have to be merged in the next major one.
But this way looks a bad design, too messy.
I think the bigger issue is that sqlite-utils
mixes mechanism (it implements the 12-step way to alter SQLite tables) and policy (it has an opinionated stance on what column types should be used).
That might be a design choice to make it accessible to users by providing a reasonable set of defaults, but it doesn't quite fit my use case.
It might make sense to extract a separate library that provides just the mechanisms, and then sqlite-utils
would sit on top of that library with its opinionated set of policies.
That would be a very big change, though.
I might take a stab at extracting the library, but just for the table schema migration piece, not all the other features that sqlite-utils
supports. I wouldn't expect sqlite-utils
to depend on it.
Part of my motivation is that I want to provide some other abilities, too, like support for CHECK constraints. I see that the issue in this repo (https://github.com/simonw/sqlite-utils/issues/358) proposes a bunch of short-hand constraints, which I wouldn't want to accidentally expose to people -- I want a layer that is a 1:1 mapping to SQLite.
Isn't your suggestion too fundamental for the utility?
The bigger flexibility, the bigger complexity. Your idea make sense defenitely, but how often do you make schema changes? And how many people could benefit from it, what do you think?
Ah, it looks like that is controlled by this dict: https://github.com/simonw/sqlite-utils/blob/main/sqlite_utils/db.py#L178
I suspect you could overwrite the datetime entry to achieve what you want
And thank you for pointing me to it. At least, i can make a monkey patch for my need...
Ha, yes, I might end up making something very niche. That's OK.
I'm building a UI for Datasette that lets users make schema changes, so it's important to me that the tool work in a non-surprising way -- if you ask for a column of type X, you should get type X. If the column or table previously had CHECK constraints, they shouldn't be silently removed. And so on. I had hoped that I could just lean on sqlite-utils, but I think it's a little too surprising.
lets users make schema changes, so it's important to me that the tool work in a non-surprising way -- if you ask for a column of type X, you should get type X. If the column or table previously had CHECK constraints, they shouldn't be silently removed
I've got your concern. Let's see if we will be replied on it and i'll close the issue some later.
I live the patch here for others:
original code
$ which sqlite-utils | xargs cat
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import re
import sys
from sqlite_utils.cli import cli
if __name__ == '__main__':
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
sys.exit(cli())
patched/sqlite-utils.py
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import re
import sys
from sqlite_utils.cli import cli
# New imports
from unittest.mock import patch
from sqlite_utils.cli import VALID_COLUMN_TYPES
if __name__ == '__main__':
# Choices of the option `--type`
cli.commands['transform'].params[2].type.types[1].choices.append('DATETIME')
# The dicts has to be extended with a new type
with patch.dict('sqlite_utils.db.COLUMN_TYPE_MAPPING', {'DATETIME': 'DATETIME'}),\
patch('sqlite_utils.cli.VALID_COLUMN_TYPES', VALID_COLUMN_TYPES + ("DATETIME", )):
# Command is unchanged
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
sys.exit(cli())
And now it's working
$ sqlite-utils schema events.sqlite cards.chunk.get
CREATE TABLE "cards.chunk.get" (
[id] INTEGER PRIMARY KEY NOT NULL,
[timestamp] TEXT,
)
$ python patched/sqlite-utils.py transform events.sqlite cards.chunk.get --type timestamp DATETIME
$ sqlite-utils schema events.sqlite cards.chunk.get
CREATE TABLE "cards.chunk.get" (
[id] INTEGER PRIMARY KEY NOT NULL,
[timestamp] DATETIME,
)
Hey. Currently i do transformation with the type
--type TEXT
, but i noticed using the sqlalchemy based library dataset that is reading and writing differ depending on the column typesTEXT
,DATETIME
.Is it possible to alter a column type to
DATETIME
somehow using Sqlite-Utils?