Open EmilianoCostantini opened 5 years ago
Is this feature request essentially to add fuzzy matching? That's something I've wondered about.
Fuzzy matching would be awesome, but even more taxing than what I was thinking. My idea was deterministically calculate the keywords permutations, then add regExps. No stochasticity involved.
Say for instance the user enters in the searchbox the keywords:
xxxxxx yyyyyy zzzzzz wwwwww
That would be 4 different words, that can be ordered in 4! (that is, 24) different ways:
xxxxxx yyyyyy zzzzzz wwwwww
xxxxxx yyyyyy wwwwww zzzzzz
xxxxxx wwwwww zzzzzz yyyyyy
wwwwww yyyyyy zzzzzz xxxxxx
zzzzzz yyyyyy xxxxxx wwwwww
(and so on and so forth, up till the 24th possible permutation.)
Each permutation should be converted to regexp:
/([\s\S]*)xxxxxx([\s\S]*)yyyyyy([\s\S]*)zzzzzz([\s\S]*)wwwwww([\s\S]*)/i
/([\s\S]*)xxxxxx([\s\S]*)yyyyyy([\s\S]*)wwwwww([\s\S]*)zzzzzz([\s\S]*)/i
/([\s\S]*)xxxxxx([\s\S]*)wwwwww([\s\S]*)zzzzzz([\s\S]*)yyyyyy([\s\S]*)/i
/([\s\S]*)wwwwww([\s\S]*)yyyyyy([\s\S]*)zzzzzz([\s\S]*)xxxxxx([\s\S]*)/i
/([\s\S]*)zzzzzz([\s\S]*)yyyyyy([\s\S]*)xxxxxx([\s\S]*)wwwwww([\s\S]*)/i
Then each bookmark matching at least one of the 24 regexps in either title or description/comments should be added to results.
As you can check on RegExr, such a bookmark would be —for instance— one containing the substrings:
foobar--xxxxxx----foobar foobar foobar...YYYYYY...foobar foobar foobar****ZZzzzz**foobar foobar foobar__wwwWWW____foobar
regardless both the order and the case, just as long as each and every keyword is there.
Even making it as efficient as possible, though, such kind of business logic could relevantly impact performances; therefore it could be good to make it optional by means of dedicated item/s in the addon Preferences
page.
I have surprisingly little knowledge about algorithms so I'll have to do some research on what you've touched on in your comment before coming back to this ticket, but I would like to incorporate a feature like this at some point. It might be a while though, I need to prioritise performance issues with enormous databases.
You're doing a priceless job, really. The uncanny decision to remove the 'Description' field by Mozilla folks has caused me some serious discomfort. Buku coupled with your addon could really be a life saver for those who, like me, heavily depend on such field.
P.S. as you can see I've tried to move the feature request upstream; this seems to make more sense to me.
Ah, a heavy user of the description field, you are the perfect target audience for a question I've been pondering.
Which of these would be preferable to you?
>
(present behaviour)The name is almost negligible to me, therefore I would choose name and description without hesitation :)
Note to self: Given that there's work ongoing to try and move the filtering logic over to the host, when that happens it may be possible to easily implement fuzzy finding for each regex-matched group using a library like this. fzf
(written in Go) is a good benchmark for how intuitive the matching can be.
Buku has interesting options involving search with regexps, such as:
buku --sany foo bar baz
buku --deep foo bar baz
It would be great if they could be leveraged from within Bukubrow searchbox, maybe by means of dedicated settings in the addon
Preferences
.Consider for instance a bookmark having
as substring in either title or comments. Such bookmark should appear in Bukubrow search results for each of the following keyword combinations:
just like it does in the ordinary bookmark sidebar's searchbox (
Ctrl-B
).