Closed simonw closed 1 year ago
Here's a query that can do that:
select
'contains_non_integer' as col
from
sortable
where
cast(cast(sortable AS INTEGER) AS TEXT) != sortable
limit
1
This will return 0 results if every column is an integer - it will shortcut the query and return a result as soon as it finds a non-integer column.
Example for a column containing only integers: https://latest.datasette.io/fixtures?sql=select%0D%0A++%27contains_non_integer%27+as+col%0D%0Afrom%0D%0A++sortable%0D%0Awhere%0D%0A++cast%28cast%28sortable+AS+INTEGER%29+AS+TEXT%29+%21%3D+sortable%0D%0Alimit%0D%0A++1
And for a column that instead contains floating point: https://latest.datasette.io/fixtures?sql=select%0D%0A++%27contains_non_integer%27+as+col%2C+*%0D%0Afrom%0D%0A++sortable%0D%0Awhere%0D%0A++cast%28cast%28sortable_with_nulls+AS+INTEGER%29+AS+TEXT%29+%21%3D+sortable_with_nulls%0D%0Alimit%0D%0A++1
Maybe sqlite-utils
should grow utility functions for guessing types in this way.
Idea: return the first detected value that is not an integer. Then code can see if that value is not-a-float - if it's obviously not-a-float we don't have to run the float detection on it.
Running this query against my dogsheep-beta index is a good test, because the key
column there starts with 25,000 numeric tweet IDs before getting the first non-integer GitHub commit. It takes 421ms.
select
'contains_non_integer' as col, *
from
search_index
where
cast(cast(key AS INTEGER) AS TEXT) != key
limit
1
This method works for detecting integers but doesn't work for floats - because cast(cast(key AS FLOAT) AS TEXT)
run against e.g. 415
produces 415.0
.
Useful thread on StackOverflow: https://stackoverflow.com/questions/32528759/how-to-check-if-a-value-is-a-number-in-sqlite
This would be really useful, especially as a prompt that displays at the top of the table page (via a fetch()
so as not to delay page loading).
Every time I demonstrate CSV import to someone I have to give them a little spiel about how important it is to ensure the types are correct - this would be a good way to improve that.
In https://github.com/simonw/sqlite-utils/issues/282 I added the sqlite-utils insert --detect-types
mechanism, but it doesn't use any of the clever SQL from this thread - it instead wraps the ingested CSV data in a not-yet-documented TypeTracker
class and uses that to figure out what the types should be.
So there's no sqlite-utils
Python library solution for detecting types for an existing table yet.
Could I use a custom sqlite3
aggregate Python function for this? Might be a bit slow.
I'm not going to do this. The new example display from here solves the same problem:
It would be great to suggest things like "the category column is all numbers but is currently TEXT, you could convert it to INTEGER"