Closed hacketiwack closed 2 years ago
Yes, I agree there is an issue. The solution you propose sounds sensible. I’m impressed that you were able to navigate the code and find the sql extension. Are you up for making a pr with this?
C might not be my native language :-P I'll try to do my best to create a working PR :-)
@hacketiwack - I gave this a go and updated the sqlext.c
borrowing code from GNOME's tracker project thats been stable for about 7yrs.
The extra trick is in the json
module - all search queries come through and map to a LIKE
query but it appears SQLite doesnt support collating with a like. Certainly when this matches what i observe in sqlite3
with the modified .so
loaded.
$ curl -s 'http://localhost:3689/api/search?type=artists&media_kind=music&query=foo'| jq
{
"artists": {
"items": [
{
"id": "2284445307090104385",
"name": "foo",
"name_sort": "foo",
"album_count": 4,
"track_count": 8,
"length_ms": 150247,
"time_played": "2022-02-04T15:30:08Z",
"time_added": "2022-01-29T13:57:03Z",
"in_progress": false,
"media_kind": "music",
"data_kind": "file",
"uri": "library:artist:2284445307090104385",
"artwork_url": "./artwork/group/23?v=1643559909"
}
],
"total": 1,
"offset": 0,
"limit": -1
}
}
$ curl -s 'http://localhost:3689/api/search?type=artists&media_kind=music&query=foó'| jq
{
"artists": {
"items": [
{
"id": "6275129402329078952",
"name": "foó",
"name_sort": "foó",
"album_count": 1,
"track_count": 1,
"length_ms": 6034,
"time_added": "2022-02-05T10:30:41Z",
"in_progress": false,
"media_kind": "music",
"data_kind": "file",
"uri": "library:artist:6275129402329078952",
"artwork_url": "./artwork/group/41?v=1644057070"
}
],
"total": 1,
"offset": 0,
"limit": -1
}
}
vs
$ sqlite3 ~/dev/_install/var/cache/owntone/songs3.db
SQLite version 3.34.0 2020-12-01 16:14:00
Enter ".help" for usage hints.
sqlite> .load /home/ray/dev/_install/lib/owntone/owntone-sqlext.so
sqlite> select artist,title from files where artist = "foo";
foo|48k sine
foo|MZZZ
foo|abcd
foo|efgh
foo|XYZ
foo|AbCd
foo|one
foo|three
foó|with diacratic
sqlite>
So the Q will be if @ejurgensen will be happy to downgrade the searches to approx equals
instead like
. I didnt check what is requested from Apple Remote appl but the web interface search won't work with just the sqlext
change (but still worthy of a PR #1415 which I'll send through)
Changes in httpd_json
's search_*()
queries modified like this:
diff --git a/src/httpd_jsonapi.c b/src/httpd_jsonapi.c
index 5989c440..60c35a34 100644
--- a/src/httpd_jsonapi.c
+++ b/src/httpd_jsonapi.c
@@ -4290,9 +4290,9 @@ search_tracks(json_object *reply, struct httpd_request *hreq, const char *param_
if (param_query)
{
if (media_kind)
- query_params.filter = db_mprintf("(f.title LIKE '%%%q%%' AND f.media_kind = %d)", param_query, media_kind);
+ query_params.filter = db_mprintf("((f.title LIKE '%%%q%%' OR f.title = '%q') AND f.media_kind = %d)", param_query, media_kind);
else
- query_params.filter = db_mprintf("(f.title LIKE '%%%q%%')", param_query);
+ query_params.filter = db_mprintf("(f.title LIKE '%%%q%%' OR f.title = '%q')", param_query);
}
else
{
@@ -4351,9 +4351,9 @@ search_artists(json_object *reply, struct httpd_request *hreq, const char *param
if (param_query)
{
if (media_kind)
- query_params.filter = db_mprintf("(f.album_artist LIKE '%%%q%%' AND f.media_kind = %d)", param_query, media_kind);
+ query_params.filter = db_mprintf("((f.album_artist LIKE '%%%q%%' OR f.album_artist = '%q') AND f.media_kind = %d)", param_query, media_kind);
else
- query_params.filter = db_mprintf("(f.album_artist LIKE '%%%q%%')", param_query);
+ query_params.filter = db_mprintf("(f.album_artist LIKE '%%%q%%' OR f.album_artist = '%q')", param_query);
}
else
{
@@ -4413,9 +4413,9 @@ search_albums(json_object *reply, struct httpd_request *hreq, const char *param_
if (param_query)
{
if (media_kind)
- query_params.filter = db_mprintf("(f.album LIKE '%%%q%%' AND f.media_kind = %d)", param_query, media_kind);
+ query_params.filter = db_mprintf("((f.album LIKE '%%%q%%' OR f.album = '%q') AND f.media_kind = %d)", param_query, media_kind);
else
- query_params.filter = db_mprintf("(f.album LIKE '%%%q%%')", param_query);
+ query_params.filter = db_mprintf("(f.album LIKE '%%%q%%' OR f.album = '%q')", param_query);
}
else
{
@@ -4475,9 +4475,9 @@ search_composers(json_object *reply, struct httpd_request *hreq, const char *par
if (param_query)
{
if (media_kind)
- query_params.filter = db_mprintf("(f.composer LIKE '%%%q%%' AND f.media_kind = %d)", param_query, media_kind);
+ query_params.filter = db_mprintf("((f.composer LIKE '%%%q%%' OR f.composer = '%q') AND f.media_kind = %d)", param_query, media_kind);
else
- query_params.filter = db_mprintf("(f.composer LIKE '%%%q%%')", param_query);
+ query_params.filter = db_mprintf("(f.composer LIKE '%%%q%%' OR f.composer = '%q')", param_query);
}
else
{
@@ -4532,7 +4532,7 @@ search_playlists(json_object *reply, struct httpd_request *hreq, const char *par
query_params.type = Q_PL;
query_params.sort = S_PLAYLIST;
- query_params.filter = db_mprintf("((f.type = %d OR f.type = %d OR f.type = %d) AND f.title LIKE '%%%q%%')", PL_PLAIN, PL_SMART, PL_RSS, param_query);
+ query_params.filter = db_mprintf("((f.type = %d OR f.type = %d OR f.type = %d) AND (f.title LIKE '%%%q%%' OR f.title = '%q'))", PL_PLAIN, PL_SMART, PL_RSS, param_query);
ret = fetch_playlists(&query_params, items, &total);
if (ret < 0)
will, with the sqlext, give @hacketiwack what you're looking for:
$ curl -s 'http://localhost:3689/api/search?type=artists&media_kind=music&query=foo'| jq
{
"artists": {
"items": [
{
"id": "2284445307090104385",
"name": "foo",
"name_sort": "foo",
"album_count": 4,
"track_count": 8,
"length_ms": 150247,
"time_played": "2022-02-04T15:30:08Z",
"time_added": "2022-01-29T13:57:03Z",
"in_progress": false,
"media_kind": "music",
"data_kind": "file",
"uri": "library:artist:2284445307090104385",
"artwork_url": "./artwork/group/23?v=1643559909"
},
{
"id": "6275129402329078952",
"name": "foó",
"name_sort": "foó",
"album_count": 1,
"track_count": 1,
"length_ms": 6034,
"time_added": "2022-02-05T10:30:41Z",
"in_progress": false,
"media_kind": "music",
"data_kind": "file",
"uri": "library:artist:6275129402329078952",
"artwork_url": "./artwork/group/41?v=1644057070"
}
],
"total": 2,
"offset": 0,
"limit": -1
}
}
Excellent @whatdoineed2do! Thanks
The fix to apple remote appl is potentially tricky. The search request comes through bookended witth wildcards: for example the search for Foo
arrives at the search as *Foo*
and thats how the lexer picks this entire token
daap: DAAP request: '/databases/1/groups?meta=dmap.itemname,dmap.itemid,dmap.persistentid,daap.songartist,daap.songyear,daap.songtracknumber,daap.groupalbumcount,daap.songartistid&type=music&group-type=artists&sort=album&include-sort-headers=1&query=('daap.songartist:*Foo*'+'daap.songartist!:'+('com.apple.itunes.extended-media-kind:1','com.apple.itunes.extended-media-kind:32'))&session-id=1274859047'
daap: DAAP request: '/databases/1/groups?meta=dmap.itemname,dmap.itemid,dmap.persistentid,daap.songartist,daap.songyear,daap.songtracknumber,com.apple.itunes.cloud-id,daap.songartistid,daap.songalbumid,dmap.persistentid,daap.songtime,daap.songdatereleased,daap.songgenre,dmap.downloadstatus&type=music&group-type=albums&sort=album&include-sort-headers=0&query=('daap.songalbum:*Foo*'+'daap.songalbum!:'+('com.apple.itunes.extended-media-kind:1','com.apple.itunes.extended-media-kind:32'))&session-id=1274859047'
@ejurgensen - a simple way to do this is to modifer the lexer that creates the query and stripping the bookends *
and adding the extra clause:
diff --git a/src/parsers/daap_parser.y b/src/parsers/daap_parser.y
index 3c9109d0..70fe61c0 100644
--- a/src/parsers/daap_parser.y
+++ b/src/parsers/daap_parser.y
@@ -292,6 +292,8 @@ static void sql_append_dmap_clause(struct daap_result *result, struct ast *a)
bool is_equal = (a->type == DAAP_T_EQUAL);
char escape_char;
char *key;
+ char *wildcard_stripped;
+ int len;
if (!k || k->type != DAAP_T_KEY || !(key = (char *)k->data))
{
@@ -335,11 +337,27 @@ static void sql_append_dmap_clause(struct daap_result *result, struct ast *a)
}
else if (!dqfm->as_int && v->type == DAAP_T_WILDCARD)
{
+ sql_append(result, "(");
+ if (is_equal)
+ {
+ len = strlen((char *)v->data);
+ if (len > 2)
+ {
+ // bookended with '*', which we strip
+ wildcard_stripped = strdup(((char *)v->data)+1);
+ wildcard_stripped[len-2] = '\0';
+
+ sql_append(result, "%s = '%s' OR ", dqfm->db_col, wildcard_stripped);
+ free(wildcard_stripped);
+ }
+ }
+
sql_like_escape((char **)&v->data, &escape_char);
sql_str_escape((char **)&v->data);
sql_append(result, "%s", dqfm->db_col);
sql_append(result, is_equal ? " LIKE " : " NOT LIKE ");
sql_append(result, "'%s'", (char *)v->data);
+ sql_append(result, ")");
if (escape_char)
sql_append(result, " ESCAPE '%c'", escape_char);
return;
resulting in:
daap: Sorting songlist by album
daap: DAAP browse query filter: ('daap.songartist:*Foo*' 'daap.songartist!:' ('com.apple.itunes.extended-media-kind:1','com.apple.itunes.extended-media-kind:32'))
daap: SQL filter w/client mod: ((f.album_artist = 'Foo' OR f.album_artist LIKE '%Foo%') AND (1 = 1) AND (f.media_kind = 1 OR (1 = 0))) AND (f.data_kind <> 1)
And this brings back the correct result on the IOS app.
Is this acceptable for the DAAP parser? i'm not overly sure how to upd the lex tokens to automatially pull this apart
The proposed solution looks odd to me, I don't see how adding an "OR xxx" is an actual solution. Not sure what is meant by approx equals
? At first glance the changes to parser also don't look very nice.
The collation change in sqlext appears is not enough to suppport diacritic artist / album /... searches (sqliite only supports collation search critieria using =
) and thus need changes in jsonapi
and daap lexer
to enhance the existing artist / album /... searches which currently use search criteria like
To expand:
We add diacritic handling in the existing sqlext DAAP collation. BUT this will only work with queres like this select ... where artist = 'xxx'
and query will find the crème brûlée
using the text creme brulee
This 'select .... where artist like '%creme% collate DAAP
will not find crème brûlée
even with the diacritic update to sqlext. Tested this on sqlite
cmd line after loading the DAAP sql extension.
To get around this limitation (can't do LIKE
) make the existing search for album, artist query from:
select album from files where album like '%xxx%'
to
select album from files where (album = 'xxx' OR album like '%xxx%')
The =
will take care of the diacritics with approx equals
(sorry, bad phrasing): allowing xxx = creme brulee
find crème brûlée
.
The lexer mod tries to take care of search strings coming through as *xxx*
and tries to use the xxx
in the search as above; its v possible that the lexer can be enhanced to give another rules to handle this stripping of wildcards but my yacc/lex is limited.
Thanks for the additional explanations @whatdoineed2do. I'm not a C/C++ programmer (too rusty) and therefore my view on the subject might be wrong: does the code need to go through linting? Was it what rubbed you the wrong way @ejurgensen?
allowing xxx = creme brulee find crème brûlée
This hints at what is wrong, because xxx = creme won't find anything. The point of this kind of a search is to get substring matches. That's why LIKE is used, and why just adding "OR xxx =" isn't a solution.
Since collate doesn't work for LIKE, I think a proper solution would be to create a custom operator. Alternatively, something like unaccent(f.title) LIKE unaccent("%pattern%") might also work, but it is less extensible and not quite so pretty. Then there is also performance to think about.
I did some testing of the unicode LIKE operator in sqlean, and it seems pretty good. It also seems to handle "æ" LIKE "Æ" and even "o" LIKE "Ø". I will check what it would take to use it in OwnTone. Perhaps it is also possible to replace the current DAAP collation with sqlean's, which apparently is dependency free.
It looks very promising! Looking forward to it!
Good news and bad news. The good: sqlean's unicode LIKE implementation seems to work, and you can try it in this branch. The bad: I have mixed feelings about including this sqlean code. It's a peculiar mix of very advanced stuff and basic programming errors, some of which are easily identified just be turning on compiler warnings. It looks like it is quite old, perhaps its a result of going through multiple maintainers. I did report the errors, but so far no response. The errors are patched in the branch I linked to.
Also, their COLLATE isn't attractive in my view - for one, it sorts numbers before letters.
Writing our own LIKE seems a bit risky, so the best would be to find an existing, battle-hardened implementation based on libunistring. My Googling hasn't uncovered that yet, but perhaps you would have better luck.
Looks like sqlean is partially copied from sqlite's icu, which could explain why it is as it is. Using icu would add a new dependency, which I don't like, so instead I think I will try to convert the current dependency to a libunistring dependency. Hopefully that isn't too difficult or risky.
It seems like going this route means that LIKE lookups won't be able to use indexes. Obviously bad for performance, but on the other hand, I don't imagine the current wildcard lookups have great performance either, so maybe it will be ok.
Thanks @ejurgensen. Yes, it seems that there's no straightforward way to do it. I will investigate as well from my side if there are different options.
I've made a libunistring-based LIKE implementation in this new branch, but alas, it has poor performance. It is 4-5x slower than both the current LIKE and the sqlean version. It makes the search noticeably slower on my RPi. The culprit is libunistring's normalization and case folding (u32_casefold).
I'll see if I can find some way of speeding it up.
Instead of doing the normalization on every comparison, would it not make sense to do it during the scan (on insert/update)?
We could add a column for the normalized value (or maybe we can reuse the existing sort fields?), during the scan store the normalized value in the db. Before doing a search, we then would have to normalize the search parameter and do the LIKE statement against the new column with the normalized value.
There is also a sqlite extension for full text search: https://www.sqlite.org/fts5.html I have zero experience with it, but it looks like it can handle diacritics.
Unfortunately, reusing the existing sort columns wouldn't work, because a search for e.g. "test 2" would fail, since it is stored as something like "test 00002". So for stored normalization and case folding, it would be necessary to add quite a few mirror columns + additional logic on when to use these columns, and I think that is a considerable downside.
Except for libunistring, it seems normalizing on the fly isn't that expensive. Both sqlean and libicu are on par with vanilla LIKE. Need to test some more, however.
It's also a nice quality of on-the-fly that a regular select title from files where title like "%xxx%"
yields a sensible result.
fts5 certainly looks very interesting. As I understand it would add the capability to make fast combined searches (e.g. beatles hard days
), which especially for the web UI would be nice. It does look a bit daunting, though :-)
Looks like using fts5's functions for case folding and diacritics removal is a winner. I copied the functions into sqlext.c and then called those instead of the slow libunistring call, and now it is plenty fast. It's in the db_unicode_search1 branch. I will test it a bit more, but so far it is promising.
Now to listen to some of the French music I realize I have after testing diacritics.
That's absolutely awesome. I need to build this branch to test it myself. Thanks a lot.
I've merged it now. Thanks for bringing this forward.
I'm looking forward to using this feature! Thanks a lot!
Me too. Good work all finding a solution. Finally I can search for my ørganek album and it may hit.
The search needs improvements. Especially when it comes to search for titles or artist names with diacritics.
Current situation In search fields of any Owntone server clients - Apple Remote app, Owntone web interface, etc. - it is not possible to search for an item name - tracks, artists, playlists, podcasts, albums, audiobooks, etc. - without specifying exactly name of the item with its correct spelling: e.g. searching "Lumière" will not bring the same results as searching "Lumiere", or "Crăciun fericit" should be findable with "Craciun fericit". We're not talking here about the transliteration cases, which should be the discussion for another enhancement.
Expected situation When a user search for an item in their library, they should not have to enter any accented characters (with diacritics) in order to find them. A search should be able to find all items being written with or without accents.
Idea I've looked at the source code and, in my opinion, one of the elegant way to handle this enhancement is to amend the method
sqlext_daap_unicode_xcollation
in the file sqlext.c. This collation function should:u8_normalize
of GNU libunistring).uc_is_property_diacritic
)