simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases
https://sqlite-utils.datasette.io
Apache License 2.0
1.67k stars 111 forks source link

sqlite-utils search command #192

Closed simonw closed 4 years ago

simonw commented 4 years ago

A command that knows how to run a search against a FTS enabled table and return results ranked by relevance.

simonw commented 4 years ago

Relevant example using a SQLite CTE: https://github.com/simonw/datasette/issues/268#issuecomment-675725464

CTEs were added in SQLite 3.8.3 in 2014-02-03 so they should be safe to use. If someone tries to run sqlite-utils search no an older version of SQLite they'll get an error, which I think is OK.

simonw commented 4 years ago

Just realized there's a problem with the SQL I am using here: joining to get rank in this way only works against FTS5 tables, it doesn't work against FTS4.

https://github.com/simonw/sqlite-utils/blob/c8a900df59efd34f394c863c0adff9912f1bf1d7/sqlite_utils/db.py#L1303-L1319

simonw commented 4 years ago

I could depend on my sqlite-fts4 library to solve this: https://github.com/simonw/sqlite-fts4

Or I could say that only FTS5 is supported for ranked searches - but still support .search() against FTS4 just without the option to sort by relevance.

simonw commented 4 years ago

FTS5 was added in SQLite 3.9.0 in 2015-10-14 - so about a year after CTEs, which means CTEs will always be safe to use with FTS5 queries.

simonw commented 4 years ago

I want .search() to work against both FTS5 and FTS4 tables - but sort by rank should only work for FTS5.

This means I need to be able to introspect and tell if a table is FTS4 or FTS5.

simonw commented 4 years ago

This is a bit surprising:

(sqlite-utils) sqlite-utils % sqlite-utils search 24ways-fts4.db articles maps -c title      
[{"rowid": 41, "title": "What Is Vagrant and Why Should I Care?", "rank": -1.9252039178908076},
 {"rowid": 298, "title": "First Steps in VR", "rank": -1.9945466378736434},
 {"rowid": 43, "title": "Content Production Planning", "rank": -2.1928058363046143},
 {"rowid": 100, "title": "Moo'y Christmas", "rank": -2.2698482999851675},
 {"rowid": 91, "title": "Infinite Canvas: Moving Beyond the Page", "rank": -2.290928999035195},
 {"rowid": 175, "title": "Front-End Code Reusability with CSS and JavaScript", "rank": -2.498731782924352},
 {"rowid": 209, "title": "Feeding the Audio Graph", "rank": -2.619968930100356},
 {"rowid": 296, "title": "Animation in Design Systems", "rank": -2.62060151817201},
 {"rowid": 118, "title": "Ghosts On The Internet", "rank": -2.7224894534521087},
 {"rowid": 77, "title": "Colour Accessibility", "rank": -2.7389782859427343},
 {"rowid": 245, "title": "Web Content Accessibility Guidelines 2.1\u2014for People Who Haven\u2019t Read the Update", "rank": -2.9750992611162888},
 {"rowid": 56, "title": "Helping VIPs Care About Performance", "rank": -3.0819662908932535},
 {"rowid": 109, "title": "Geotag Everywhere with Fire Eagle", "rank": -3.1371975973877277},
 {"rowid": 203, "title": "Jobs-to-Be-Done in Your UX Toolbox", "rank": -3.2416719461682733},
 {"rowid": 276, "title": "Your jQuery: Now With 67% Less Suck", "rank": -3.4947916564653028},
 {"rowid": 58, "title": "Beyond the Style Guide", "rank": -3.7508321464447905},
 {"rowid": 225, "title": "Good Ideas Grow on Paper", "rank": -4.120077674716844},
 {"rowid": 168, "title": "Unobtrusively Mapping Microformats with jQuery", "rank": -4.662224207228984},
 {"rowid": 27, "title": "Putting Design on the Map", "rank": -5.667327088267961},
 {"rowid": 220, "title": "Finding Your Way with Static Maps", "rank": -9.952534352591737}]

I requested just -c title but also got back rowid and rank.

simonw commented 4 years ago

Also that order looks incorrect. It looks like most relevant came back last, not first.

simonw commented 4 years ago

I'm going to have it only output the exact -c columns you requested (if you requested any). Add --rank to specify the rank column, since you may not know what its name is.

simonw commented 4 years ago

This looks pretty good now!

% sqlite-utils search 24ways.db articles simon -c title -c author -t 
title                                                                          author
-----------------------------------------------------------------------------  ------------------
Don't be eval()                                                                Simon Willison
DOM Scripting Your Way to Better Blockquotes                                   Jeremy Keith
Swooshy Curly Quotes Without Images                                            Simon Collison
The Articulate Web Designer of Tomorrow                                        Simon Collison
Writing Responsible JavaScript                                                 Drew McLellan
Going Nuts with CSS Transitions                                                Natalie Downe
Managing a Mind                                                                Christopher Murphy
Design Systems                                                                 Laura Kalbag
Bringing Your Code to the Streets                                              Ruth John
Taming Complexity                                                              Simon Collison
Unobtrusively Mapping Microformats with jQuery                                 Simon Willison
Crafting the Front-end                                                         Ben Bodien
Develop Your Naturalist Superpowers with Observable Notebooks and iNaturalist  Natalie Downe
Fast Autocomplete Search for Your Website                                      Simon Willison