Open simonw opened 4 years ago
This needs querystring parameter design. Some options:
?_sort_clause=cast(mycol+as+integer)
and ?_sort_clause_desc=cast(mycol+as+integer)
- allowing any expression. This would need to be disabled if arbitrary SQL was turned off, similar to the restrictions on ?_where=
.?_sort_numeric=mycol
and ?_sort_numeric_desc=mycol
- this would cast to real
which would work on integer values as well. It could be allowed even when arbitrary SQL was disabled.I could even let plugins define new sort types. Imagine a plugin that enables this:
?_sort_date_desc=mycol
- where it knows how to handle specific date formats, or even uses dateutil.parser.parse
.
Sorting by date when the column has a junk date format in it is such a column need it should maybe ship in Datasette by default - though I've been trying to avoid adding heavy dependencies like dateutil
if I can get away with it.
This does feel like a weird plugin hook just because there aren't really THAT many different use-cases that plugins could solve. The ones I can think of are:
Could this work if I just allow _sort_clause=
?
One possible solution for the no-arbitrary-SQL case: users can define sort orders in metadata.json/yml
. So if you want to enable sort-by-distance without enabling arbitrary SQL you could add something like this:
databases:
mydb:
tables:
museums:
sort_clause: bm25(fts)
There's something interesting about figuring out which sort options should be offered in the column actions menu.
Two options:
metadata.yml
for the table.The simplest solution would be for Python code to scan all of the visible values on the current page and show the column action for "sort by this numeric" based purely on that. I already do that in the JavaScript for "are there any blank values in the first page?" here:
I think the first version of this feature involves implementing ?_sort_numeric=col
and ?_sort_numeric_desc=col
plus the JavaScript to detect if those values should be shown in the column actions menu.
One question: how to reflect that this is happening in the current sort UI. This menu here for example:
And this interface: how should it indicate that a text is currently sorted numerically v.s. sorted alphabetically, and allow the user to switch from one to the other?
For the "Sort by X" select menu case... I could automatically expand that menu to contain extra options for "Sort numerically by X" for each TEXT column in the table. That's a pretty good option.
For the action cog menu, I can add the extra options to the cog menu - and rely on the fact that the title of the page will say "Sorted numerically by colname descending".
This is enough of a design to build a working prototype.
The Sort by <select>
menu needs a rethink: https://github.com/simonw/datasette/blob/4f7c0ebd85ccd8c1853d7aa0147628f7c1b749cc/datasette/templates/table.html#L80-L87
If it's going to include sort by numeric options it needs a different format - since ?_sort=colname
needs to also support ?_sort=numeric_colname
- but shouldn't clash with a weird column already named numeric_colname
.
Can I come up with a value syntax for this that is guaranteed not to clash with a weirdly named existing column?
I think so. I could use {"type": "numeric", "column": "mycolumn"}
as the value - then if a column has a name that is itself valid JSON (weird but possible) the _sort=
value would be {"type": "default", "column": "{\"type\": \"numeric\", \"name\": \"column\"}"}
.
Simpler option: ?_sort=
column values look like this:
mycolumn
- for sort by columnmycolumn$numeric
- for sort by column after cast to floatmycolumn$morename$default
- for the edge case where the column name itself contains a $ symbolEven better solution: use URL encoding in the parameter details. This is consistent with how ?_next=
tokens work, e.g. ?_next=0.291861560261786%2Ce%2Cj
.
So the format can be:
mycolumn
urlencoded-mycolumn$castname
For most columns this will look like: ?_sort=score$numeric
For columns with a $
in their name it will be ?_sort=score%24hasdollar$numeric
Problem: both $
and ,
are usually URL encoded anyway. I need a character which isn't encoded by default, so that I can use its encoded form to show it is part of the column name and its un-encoded form to split the cast indicator.
_
is a candidate here - not encoded by default, but can be encoded as %5F
.
The other unreserved non-alphanumeric characters are -
, .
, _
, ~
.
Of these, ~
is least likely to show up in a column name. So I'll use that.
mycolumn
mycolumn~numeric
mycolumn%7Ewith%7Etildes~numeric
Urgh this isn't going to work. %7E~%7E
gets decoded as ~~~
so I wouldn't be able to tell the difference.
I could use double-percentage-encoding here instead. I feel like there's a simpler solution that I'm missing (and that may well be in use within Datasette already, I'm not doing great thinking this morning).
Much easier solution: if the suffix is ~numeric
then treat it as the column name sorted numerically. If the suffix is missing OR the suffix is ~default
, sort without casting. Only add the ~default
suffix if the column name itself contains at least one ~
symbol.
Using ~
because it doesn't need to be URL-encoded.
Something to watch out for: ""
empty strings cast to 0.0
:
select cast("100" as real), "100", cast(null as real), cast("" as real)
cast("100" as real) | "100" | cast(null as real) | cast("" as real) |
---|---|---|---|
100.0 | 100 | 0.0 |
cast(nullif(colname, '') as real)
can fix this - it will treat ''
the same as null
.
Also need to rethink this template logic that decides if to show a column as sorted or not: https://github.com/simonw/datasette/blob/4f7c0ebd85ccd8c1853d7aa0147628f7c1b749cc/datasette/templates/_table.html#L10-L14
Prototype so far:
diff --git a/datasette/views/table.py b/datasette/views/table.py
index ea11a51..d61f8bd 100644
--- a/datasette/views/table.py
+++ b/datasette/views/table.py
@@ -497,17 +497,32 @@ class TableView(RowTableShared):
if sort and sort_desc:
raise DatasetteError("Cannot use _sort and _sort_desc at the same time")
+ def parse_sort(sort):
+ if "~" in sort:
+ if sort.endswith("~default"):
+ col = sort.rsplit("~", 1)[0]
+ return col, escape_sqlite(col)
+ elif sort.endswith("~numeric"):
+ col = sort.rsplit("~", 1)[0]
+ return col, "cast(nullif({}, '') as real)".format(escape_sqlite(col))
+ else:
+ return sort, escape_sqlite(sort)
+ else:
+ return sort, escape_sqlite(sort)
+
if sort:
- if sort not in sortable_columns:
- raise DatasetteError("Cannot sort table by {}".format(sort))
+ sort_column, sort_clause = parse_sort(sort)
+ if sort_column not in sortable_columns:
+ raise DatasetteError("Cannot sort table by {}".format(sort_column))
- order_by = escape_sqlite(sort)
+ order_by = sort_clause
if sort_desc:
- if sort_desc not in sortable_columns:
- raise DatasetteError("Cannot sort table by {}".format(sort_desc))
+ sort_column, sort_clause = parse_sort(sort_desc)
+ if sort_column not in sortable_columns:
+ raise DatasetteError("Cannot sort table by {}".format(sort_column))
- order_by = "{} desc".format(escape_sqlite(sort_desc))
+ order_by = "{} desc".format(sort_clause)
from_sql = "from {table_name} {where}".format(
table_name=escape_sqlite(table),
Maybe ?_sort_numeric=col
and ?_sort_numeric_desc=col
would be better here.
I could add these sorting links to the cog menu for any TEXT
columns.
If a text column actually contains numbers, being able to "sort by column, treated as numeric" would be really useful.
Probably depends on column actions enabled by #690