Closed jmargatan closed 8 years ago
Hi @jmargatan, you are absolutely right. There is an open issue on the fuzzy search module used here: https://github.com/mattyork/fuzzy/issues/3. I am sure they would accept a pull request implementing this.
@janraasch: I glanced through mattyork/fuzzy's code. The current aggressive/greedy matching strategy will need some major refactoring to have full-text capability and this may not align with the direction of the project, "1k standalone fuzzy search / fuzzy filter".
After digging into this, I think the most optimal way to achieve a full-text search will require some dynamic programming approach, commonly referred to as the LCS problem. It won't be as fast as greedy matching--"fast" is a relative term--but we can tweak it a little bit by caching intermediary computational result.
As I start typing my library I bumped into jeancroy/fuzzaldrin-plus. I read through the description, it looked promising, haven't looked at the code yet, but it seems to align to what we want to achieve. Would you be open to check if this would be a good alternative? Thanks in advance.
Hi @jmargatan. Thank you for your comment. I appreciate that you have not forgotten about this.
I would like to reply with two things.
fuzzaldrin-plus
looks great, too. Plus I am pretty sure it is well proven at least for the file path use case. However I could not find an API reference on the repo. It does say that it is compatible with fuzzaldrin. Do you know, if there is a way to get the positions of the matched characters in the string or is there another way to easily wrap the matched characters? Something like passing options = { pre: '<bold>', post: '</bold>' }
to fuzzy.js.Hi @jmargatan, I finally remember the other js-library I really wanted to use for this, but could not at the time. Now, once https://github.com/krisk/Fuse/issues/6 lands in a non-beta release of https://github.com/krisk/Fuse I think this will fix our issue.
Alright, good news. This seems to work. Would you be willing to test this? I released a beta for you to check it out: https://github.com/janraasch/tab-ahead/releases/tag/1.3.0-beta1. You can download the TabAhead.zip
file, then unzip it, and under chrome://extensions/
enable Developer mode
to be able to Load unpacked extension...
.
I would love to get your feedback on this, @jmargatan.
Woohoo. I'll give it a try once I get home tonight. Thanks @janraasch!
@janraasch I wasn't able to test it. I disabled the original one and upload the beta version as you suggested but Alt+T does not work and the search box itself does not produce any matches. Thoughts?
hi @jmargatan, I'm sorry. You're right. There is some problem when packaging the new dependencies. Don't worry, I'll figure it out and release another beta. Thank you for being willing to test this.
Alright, can you please try again with https://github.com/janraasch/tab-ahead/releases/tag/v1.3.0-beta2.
Also, you do not have to deinstall the original version from the app store, the two can run alongside each other, so you can really compare the results.
Finally, the keyboard shortcut can be configured on the chrome://extensions/
page as well. There is a link named Keyboard shortcuts
on the bottom right.
Let me know, when you get around to having a look at this, @jmargatan,
@janraasch, I think this new one is much better. :shipit: Thanks for addressing this issue, I really appreciate it.
Also thanks for pointing the keyboard shortcut. Although after uploading the the beta, the Alt-T shortcut stopped working. I have yet to figure that out.
Very well then, I'll cut a release once the version of Fuse.js
we use is out of beta. Thank you for your help. I'll close this issue, once the new version is available on the Chrome Web Store.
Hi, author of fuzzaldrin-plus here, sorry of being late to the party.
Of course fuzzaldrin-plus looks great, too. Plus I am pretty sure it is well proven at least for the file path use case.
Actually both works moslty the same, the main difference is that there's a second match context in the file name.
However I could not find an API reference on the repo. It does say that it is compatible with fuzzaldrin.
Yes it was written as a substitute for addressing user report in atom text editor. It may now have a live of it's own do I'll fix the documentation accordingly. Basic is this
fuzzaldrin = require("fuzzaldrin-plus")
//Filter a list of candidate - strings
candidates = ...
query = ...
results = fuzzaldrin.filter(candidates, query)
//Filter list of candidate objects
results = fuzzaldrin.filter(candidates, query, key: 'name')
Do you know, if there is a way to get the positions of the matched characters in the string or is there another way to easily wrap the matched characters? Something like passing options = { pre: '
', post: ' ' } to fuzzy.js.
character_positions = fuzzaldrin.match(candidate, query)
I dont really like this syntax but it was what was required for atom. I may provide a function that wrap inside specified stard/end tag.
I'll cut a release once the version of Fuse.js we use is out of beta.
Fuse is based on edit distance. And thus is much better comparing word (eg spellcheck) between them than expanding shortcut into a large result. I see v2 of Fuse tokenize the candidate into multiple word then do the comparison over each, summing result.
That approach may works well, up to the point the user type a bit of this word, and a bit of that word. You'll also see in the screenshot that every email address part is compared to gmail
so a lot is highlighted.
@janraasch: The current solution is fantastic but some side of the aggressive matching still hurts the performance. Please refer to the example below. What's your thought on it? :)
Thanks for the insight on jeancroy/fuzzaldrin-plus, @jeancroy, would be interesting to compare the result between these two.
@jeancroy, thank you very much for your detailed explanation. I am sorry for answering so late, but I am quiet busy as I am sure we all are (open source... :)).
@jmargatan Could you elaborate a little bit on what you mean by
aggressive matching still hurts the performance
I am sorry, but I am not sure what you mean just by looking at the screenshot.
And, lastly:
would be interesting to compare the result between these two.
Definitely, I agree. I will make sure to do this. (Now that I know the API ;))
@janraasch Certainly. :)
On the screenshot, I was trying to illustrate there are 2 candidates and the first one does not have the full text, "yahoo mail", but the the second one does at the end. In this case, I would assume the second candidate should have been ranked higher. At least, from LCS perspective. What's your thought on it?
Just to clarify is that screenshot taken using the beta version or the one currently available on the web store?
This is from the Chrome Web Store version.
I am not aware of a way to update a Chrome Extension. I assume I am on the latest version. Although the Web Store does say "Updated: November 16, 2015".
Ok.
Would be interesting to compare the result between these two
Original
Fuse.js beta
FuzzAldrin-Plus
Made this demo to toy with the project, hopefully it clarify the api/usage a bit too.
@jeancroy: Whoa this is cool! Let's see what @janraasch thinks. :)
IMHO the Fuse.js beta
version seems to work fine, especially, with this commit https://github.com/janraasch/tab-ahead/commit/877dbb83803ba7516416071cd34aaa0782917924 on master, which takes care of making the results less noisy by only highlighting matches with more than one character. I should probably release a new beta using the current master
so you can check it out, but maybe it is also easy enough to imagine the screenshot above w/o the single character matches :). What do you think, @jmargatan?
@jeancroy: For your examples you seem to be only searching in the title
String, can you search through title
and url
like with Fuse.js
?
var tabs = [{title: 'my-title', url: 'my-url'}]
var options = {
keys: ['url', 'title']
}
var f = new Fuse(tabs, options);
var result = f.search('my');
Alright, so here https://github.com/janraasch/tab-ahead/releases/tag/v1.3.0-beta3 is the current version built from master
.
You can download the TabAhead.zip
file, then unzip it, and under chrome://extensions/
enable Developer mode to be able to Load unpacked extension...
. The keyboard shortcut can be configured on the chrome://extensions/
page as well. There is a link named Keyboard shortcuts
on the bottom right.
Could you take this for a spin, @jmargatan? Maybe compare the results with your screenshot from this comment https://github.com/janraasch/tab-ahead/issues/34#issuecomment-212032984.
@janraasch Thanks for packing up this build. I tested with both scenarios:
From the second screenshot, the ranking does improve although the highlighted characters are rather confusing, note that the oo
at ...Google Search...
was highlighted.
Ok, one little caveat as this discussion is getting quiet long: We have to be a little bit careful not too overestimate the meaning of a few handpicked samples, because there are currently about 4600
people using this extension (with the old version) and none of them complained even though the current algorithm used is simply incorrect from a technical point of view.
Which is why I would say even though
highlighted characters are rather confusing, note that the oo at ...Google Search... was highlighted.
this is still a better solution, because
ranking does improve
so I would like to
Fuse.JS
) and release that as v1.3.0
to the public. What do you say?
For your examples you seem to be only searching in the title String, can you search through title and url like with Fuse.js?
There's the api to search for a specific key. fuzzaldrin.filter(candidates, query, {key:"title"})There's not yet the API to search for multiple of them. I'm thinking of averaging the score over search space, maybe let the user specify a weigh for every property. Other option is to select the best field, and still weight from which field it is.
I may add it, I'm a bit busy now, and most importantly, i'm now being super careful with updating that project because atom has so many users. It is still possible to achieve this goal using the fuzzaldrin.score(candidate,query) inside a double loop: foreach items / foreach indexedFields
none of them complained even though the current algorithm used is simply incorrect from a technical point of view.
I guess it depend on the complexity of the search text as well as the expectation of the user. One thing I think fuse will handle better is spelling mistake (gnail vs gmail) One thing I think fuzzaldrin will handle better is acronym. (GMYM vs George Matthew Yahoo Mail) If people don't expect to search by acronym then fuse is probably the library that meet your immediate need.
Close this issue and see what the feedback of the other users is like.
That sound like I plan. I can ping back if the multiple field feature is implemented. Or more cleanly you can open/subscribe to an issue that ask for it
@jeancroy thank you very much for the detailed insight into fuzzaldrin
and your informed opinion on the fuse vs. fuzzaldrin
matter considering the special use case at hand.
With your comment
One thing I think fuse will handle better is spelling mistake (gnail vs gmail) One thing I think fuzzaldrin will handle better is acronym. (GMYM vs George Matthew Yahoo Mail) If people don't expect to search by acronym then fuse is probably the library that meet your immediate need.
I am now even more confident to release the fuse
-version to the world.
I agree with you both. Thank you both for looking into this.
Just released v1.3.0
to the Chrome Web Store.
Consider the following use case.
Tab Opens:
When the query is
gmail
, the results are ordered as such (bold indicates highlighted letters):The proposal is to return the tab with full-text match on top of the fuzzy result, here is how it looks:
@janraasch: What do you think? :)