Prefer full-text query match on ordering

jmargatan commented 8 years ago

Consider the following use case.

Tab Opens:

George Matthew - Yahoo Mail
Inbox - george.matthew@gmail.com - Gmail

When the query is gmail, the results are ordered as such (bold indicates highlighted letters):

George Matthew - Yahoo Mail
Inbox - george.matthew@gmail.com - Gmail

The proposal is to return the tab with full-text match on top of the fuzzy result, here is how it looks:

Inbox - george.matthew@gmail.com - Gmail
George Matthew - Yahoo Mail

@janraasch: What do you think? :)

janraasch commented 8 years ago

Hi @jmargatan, you are absolutely right. There is an open issue on the fuzzy search module used here: https://github.com/mattyork/fuzzy/issues/3. I am sure they would accept a pull request implementing this.

jmargatan commented 8 years ago

@janraasch: I glanced through mattyork/fuzzy's code. The current aggressive/greedy matching strategy will need some major refactoring to have full-text capability and this may not align with the direction of the project, "1k standalone fuzzy search / fuzzy filter".

After digging into this, I think the most optimal way to achieve a full-text search will require some dynamic programming approach, commonly referred to as the LCS problem. It won't be as fast as greedy matching--"fast" is a relative term--but we can tweak it a little bit by caching intermediary computational result.

As I start typing my library I bumped into jeancroy/fuzzaldrin-plus. I read through the description, it looked promising, haven't looked at the code yet, but it seems to align to what we want to achieve. Would you be open to check if this would be a good alternative? Thanks in advance.

janraasch commented 8 years ago

Hi @jmargatan. Thank you for your comment. I appreciate that you have not forgotten about this.

I would like to reply with two things.

I think considering the use case of fuzzy.js here, which is really just filtering through a couple of urls and titles (let's assume a maximum of 100 or 200 items/urls+titles to search), we do not necessarily have to solve the LCS problem in its entirety and most performant way to satisfy the use case at hand. There is already a pull request (mattyork/fuzzy/pull/12) on the fuzzy.js repo which looks promising. What do you think?
Of course fuzzaldrin-plus looks great, too. Plus I am pretty sure it is well proven at least for the file path use case. However I could not find an API reference on the repo. It does say that it is compatible with fuzzaldrin. Do you know, if there is a way to get the positions of the matched characters in the string or is there another way to easily wrap the matched characters? Something like passing options = { pre: '<bold>', post: '</bold>' } to fuzzy.js.

janraasch commented 8 years ago

Hi @jmargatan, I finally remember the other js-library I really wanted to use for this, but could not at the time. Now, once https://github.com/krisk/Fuse/issues/6 lands in a non-beta release of https://github.com/krisk/Fuse I think this will fix our issue.

janraasch commented 8 years ago

Alright, good news. This seems to work. Would you be willing to test this? I released a beta for you to check it out: https://github.com/janraasch/tab-ahead/releases/tag/1.3.0-beta1. You can download the TabAhead.zip file, then unzip it, and under chrome://extensions/ enable Developer mode to be able to Load unpacked extension....

I would love to get your feedback on this, @jmargatan.

jmargatan commented 8 years ago

Woohoo. I'll give it a try once I get home tonight. Thanks @janraasch!

jmargatan commented 8 years ago

@janraasch I wasn't able to test it. I disabled the original one and upload the beta version as you suggested but Alt+T does not work and the search box itself does not produce any matches. Thoughts?

janraasch commented 8 years ago

hi @jmargatan, I'm sorry. You're right. There is some problem when packaging the new dependencies. Don't worry, I'll figure it out and release another beta. Thank you for being willing to test this.

janraasch commented 8 years ago

Alright, can you please try again with https://github.com/janraasch/tab-ahead/releases/tag/v1.3.0-beta2.

Also, you do not have to deinstall the original version from the app store, the two can run alongside each other, so you can really compare the results.

Finally, the keyboard shortcut can be configured on the chrome://extensions/ page as well. There is a link named Keyboard shortcuts on the bottom right.

Let me know, when you get around to having a look at this, @jmargatan,

jmargatan commented 8 years ago

@janraasch, I think this new one is much better. :shipit: Thanks for addressing this issue, I really appreciate it.

screen shot 2016-03-04 at 10 33 27 am

Also thanks for pointing the keyboard shortcut. Although after uploading the the beta, the Alt-T shortcut stopped working. I have yet to figure that out.

janraasch commented 8 years ago

Very well then, I'll cut a release once the version of Fuse.js we use is out of beta. Thank you for your help. I'll close this issue, once the new version is available on the Chrome Web Store.

jeancroy commented 8 years ago

Hi, author of fuzzaldrin-plus here, sorry of being late to the party.

Of course fuzzaldrin-plus looks great, too. Plus I am pretty sure it is well proven at least for the file path use case.

Actually both works moslty the same, the main difference is that there's a second match context in the file name.

However I could not find an API reference on the repo. It does say that it is compatible with fuzzaldrin.

Yes it was written as a substitute for addressing user report in atom text editor. It may now have a live of it's own do I'll fix the documentation accordingly. Basic is this


fuzzaldrin  = require("fuzzaldrin-plus")

//Filter a list of candidate - strings
candidates = ...
query = ...
results = fuzzaldrin.filter(candidates, query)

//Filter list of candidate objects
results = fuzzaldrin.filter(candidates, query,  key: 'name')

Do you know, if there is a way to get the positions of the matched characters in the string or is there another way to easily wrap the matched characters? Something like passing options = { pre: '', post: '' } to fuzzy.js.

character_positions = fuzzaldrin.match(candidate, query)

I dont really like this syntax but it was what was required for atom. I may provide a function that wrap inside specified stard/end tag.

I'll cut a release once the version of Fuse.js we use is out of beta.

Fuse is based on edit distance. And thus is much better comparing word (eg spellcheck) between them than expanding shortcut into a large result. I see v2 of Fuse tokenize the candidate into multiple word then do the comparison over each, summing result.

That approach may works well, up to the point the user type a bit of this word, and a bit of that word. You'll also see in the screenshot that every email address part is compared to gmail so a lot is highlighted.

jmargatan commented 8 years ago

@janraasch: The current solution is fantastic but some side of the aggressive matching still hurts the performance. Please refer to the example below. What's your thought on it? :)

screen shot 2016-04-19 at 10 29 19 am

Thanks for the insight on jeancroy/fuzzaldrin-plus, @jeancroy, would be interesting to compare the result between these two.

janraasch commented 8 years ago

@jeancroy, thank you very much for your detailed explanation. I am sorry for answering so late, but I am quiet busy as I am sure we all are (open source... :)).

@jmargatan Could you elaborate a little bit on what you mean by

aggressive matching still hurts the performance

I am sorry, but I am not sure what you mean just by looking at the screenshot.

And, lastly:

would be interesting to compare the result between these two.

Definitely, I agree. I will make sure to do this. (Now that I know the API ;))

jmargatan commented 8 years ago

@janraasch Certainly. :)

On the screenshot, I was trying to illustrate there are 2 candidates and the first one does not have the full text, "yahoo mail", but the the second one does at the end. In this case, I would assume the second candidate should have been ranked higher. At least, from LCS perspective. What's your thought on it?

janraasch commented 8 years ago

Just to clarify is that screenshot taken using the beta version or the one currently available on the web store?

jmargatan commented 8 years ago

This is from the Chrome Web Store version.

screen shot 2016-04-19 at 12 19 48 pm

I am not aware of a way to update a Chrome Extension. I assume I am on the latest version. Although the Web Store does say "Updated: November 16, 2015".

screen shot 2016-04-19 at 12 22 47 pm

janraasch commented 8 years ago

Ok.

jeancroy commented 8 years ago

Would be interesting to compare the result between these two

Original gmail0

Fuse.js beta fuse

FuzzAldrin-Plus gmail

Made this demo to toy with the project, hopefully it clarify the api/usage a bit too.

jmargatan commented 8 years ago

@jeancroy: Whoa this is cool! Let's see what @janraasch thinks. :)

janraasch commented 8 years ago

IMHO the Fuse.js beta version seems to work fine, especially, with this commit https://github.com/janraasch/tab-ahead/commit/877dbb83803ba7516416071cd34aaa0782917924 on master, which takes care of making the results less noisy by only highlighting matches with more than one character. I should probably release a new beta using the current master so you can check it out, but maybe it is also easy enough to imagine the screenshot above w/o the single character matches :). What do you think, @jmargatan?

janraasch commented 8 years ago

@jeancroy: For your examples you seem to be only searching in the title String, can you search through title and url like with Fuse.js?

var tabs = [{title: 'my-title', url: 'my-url'}]
var options = {
  keys: ['url', 'title']
}
var f = new Fuse(tabs, options);
var result = f.search('my');

janraasch commented 8 years ago

Alright, so here https://github.com/janraasch/tab-ahead/releases/tag/v1.3.0-beta3 is the current version built from master.

You can download the TabAhead.zip file, then unzip it, and under chrome://extensions/ enable Developer mode to be able to Load unpacked extension.... The keyboard shortcut can be configured on the chrome://extensions/ page as well. There is a link named Keyboard shortcuts on the bottom right.

Could you take this for a spin, @jmargatan? Maybe compare the results with your screenshot from this comment https://github.com/janraasch/tab-ahead/issues/34#issuecomment-212032984.

jmargatan commented 8 years ago

@janraasch Thanks for packing up this build. I tested with both scenarios:

From the second screenshot, the ranking does improve although the highlighted characters are rather confusing, note that the oo at ...Google Search... was highlighted.

janraasch commented 8 years ago

Ok, one little caveat as this discussion is getting quiet long: We have to be a little bit careful not too overestimate the meaning of a few handpicked samples, because there are currently about 4600 people using this extension (with the old version) and none of them complained even though the current algorithm used is simply incorrect from a technical point of view.

Which is why I would say even though

highlighted characters are rather confusing, note that the oo at ...Google Search... was highlighted.

this is still a better solution, because

ranking does improve

so I would like to

Take this version (the current beta using Fuse.JS) and release that as v1.3.0 to the public.
Close this issue and see what the feedback of the other users is like. (If there is no feedback, we did well, that is free software for you :)). If user counts start dropping then we'll rollback and go on from there.

What do you say?

jeancroy commented 8 years ago

For your examples you seem to be only searching in the title String, can you search through title and url like with Fuse.js?

There's the api to search for a specific key. fuzzaldrin.filter(candidates, query, {key:"title"})There's not yet the API to search for multiple of them. I'm thinking of averaging the score over search space, maybe let the user specify a weigh for every property. Other option is to select the best field, and still weight from which field it is.

I may add it, I'm a bit busy now, and most importantly, i'm now being super careful with updating that project because atom has so many users. It is still possible to achieve this goal using the fuzzaldrin.score(candidate,query) inside a double loop: foreach items / foreach indexedFields

none of them complained even though the current algorithm used is simply incorrect from a technical point of view.

I guess it depend on the complexity of the search text as well as the expectation of the user. One thing I think fuse will handle better is spelling mistake (gnail vs gmail) One thing I think fuzzaldrin will handle better is acronym. (GMYM vs George Matthew Yahoo Mail) If people don't expect to search by acronym then fuse is probably the library that meet your immediate need.

Close this issue and see what the feedback of the other users is like.

That sound like I plan. I can ping back if the multiple field feature is implemented. Or more cleanly you can open/subscribe to an issue that ask for it

janraasch commented 8 years ago

@jeancroy thank you very much for the detailed insight into fuzzaldrin and your informed opinion on the fuse vs. fuzzaldrin matter considering the special use case at hand.

With your comment

One thing I think fuse will handle better is spelling mistake (gnail vs gmail) One thing I think fuzzaldrin will handle better is acronym. (GMYM vs George Matthew Yahoo Mail) If people don't expect to search by acronym then fuse is probably the library that meet your immediate need.

I am now even more confident to release the fuse-version to the world.

jmargatan commented 8 years ago

I agree with you both. Thank you both for looking into this.

janraasch commented 8 years ago

Just released v1.3.0 to the Chrome Web Store.

janraasch / tab-ahead

Prefer full-text query match on ordering #34