krisk / Fuse

Lightweight fuzzy-search, in JavaScript
https://fusejs.io/
Apache License 2.0
18.05k stars 766 forks source link

Feature request: Highlighting Search key's in the result. #6

Closed Abhinav1217 closed 8 years ago

Abhinav1217 commented 11 years ago

Very simple and nice plugin. I have been looking for such fuzzy search library for calender app i am trying to build. I was hoping if there was some way to highlight or mark the query string in the result set.
What i am asking is based on command pallet in sublime text. I search for string 'rown' in search box ( eg.. on your website ). And the result set are shown with the query value visible. eg like in bold, "dan b rown ", " ro b gra n t ".

krisk commented 11 years ago

Ah, that is a highly desirable feature. However, there's one problem with it: Fuse is DOM agnostic.

Specifically, remember that Fuse operates on data objects only - in very simple terms, it iterates over text data and performs a fuzzy match algorithm. Ergo, the highlighting would have to be handled by something other than Fuse.

Nonetheless, what Fuse could do, however, is return all indices of the matches in the form of [index, length]:

Search: rown

With these indices, you can do a simple substr to highlight the text as you see fit.

I will think about this some more. Thanks!

Abhinav1217 commented 11 years ago

Thanks.. substring worked after a few tries. There is a visible performance difference so now it is clear why you didn't have this feature. But this is, as you said, is a highly desirable feature. I have chained the output to the display function like var result = displayResult(f.search('brwn')).
I am not a good plugin writer but i have seen lots of developer, writing there libraries allowing for extending it via helper plugins( eg Taffy db). Is there some scope to create plugins of similar desirable features for this library. As to keep the main library fast and efficient, but adding extra functionality for those who want it.
I have (highly unprofessional) code for highlighting the output and for sorting the output based on its rank. Currently i use it like this.

var a = f.search("bir");
var b = sort(a); //this is my own sort function
var result = displayResult(b);

Using them together(or individually) sure has a visible performance deterioration, but I am recently graduated student and so are my codes. And this adds a little extra spice my application needed.

Do think about extending with extra plugins. It will add shines to this already excellent masterpiece.

janraasch commented 11 years ago

return all indices of the matches in the form of [index, length].

+1

fxck commented 11 years ago

still planning on doing this?

krisk commented 11 years ago

@Abhinav1217, @fxck, I'm working on this right now. Pardon the delay.

fxck commented 11 years ago

Great!

smithamax commented 11 years ago

:+1:

fxck commented 11 years ago

Hey @krisk, has it proven to be too difficult, or you are just too busy at the moment?

KarlosF13 commented 11 years ago

When's this comming?

FiloSottile commented 10 years ago

+1

krisk commented 10 years ago

@fxck, @KarlosF13, @FiloSottile, @janraasch, and everyone else...

Sorry fellas, I've being extremely busy with everything else. The good news is that I have taken the entire month of December off work, so that should give me plenty of time to work on other things - this feature among them.

In the meantime, if anyone wants to have a go at it, feel free to submit a pull request!

fxck commented 10 years ago

Still looking forward to this :)

yeroshek commented 10 years ago

+1

hmvs commented 10 years ago

+1 to return indices

mhkeller commented 10 years ago

In the meantime, for those looking to implement in a UI, this might help https://github.com/jharding/bearhug

fxck commented 10 years ago

That itself is not a problem. Problem is that

all indices of the matches in the form of [index, length].

this looks promising, it's a part of atom, made for nodejs, but could probably be used on client side as well, after few modifications

janraasch commented 10 years ago

@fxck That is a very neat idea, thanks. Seems like the only hard nodejs dependency is the path module for require('path').sep. Should be easy to bundle up with browserifyor even simpler by providing a mock path module to be required.

Of course one has to be okay with the use case:

This library is used by Atom and so its focus will be on scoring and filtering paths, methods, and other things common when writing code. It therefore will specialize in handling common patterns in these types of strings such as characters like /, -, and _, and also handling of camel cased text.

fxck commented 9 years ago

any progress @krisk ? fusejs is still the only viable fuzzy search around, hightlight is the only thing it is missing :)

krisk commented 9 years ago

Wow. Long hiatus here. Feature was suggested almost 2 years ago. Mea culpa. Confession: still didn't get around to doing this, but clearly there's still some interest in this feature. Anyone interested in tackling this problem? As it stands, the underlying bitap algorithm would most likely have to change to support this.

Abhinav1217 commented 9 years ago

This really still is a desirable feature. I haven't worked with fuse since the time I asked the question. As far as I remember, I chained the outputs to another function which used a regex script I found on gist, and modified it to insert <b></b> tag around the matched string, Before I displayed it.
That itself have a significant performance degradetion, but it was usable for project at thrashold of 0.0.

I would re-quote from my last reply here,

Do think about extending it with extra plugins. It will add shines to this already excellent masterpiece.

dalgard commented 9 years ago

+1

What @fxck said – the library is great, highlighting is the only thing that's really missing.

AdamGaskins commented 9 years ago

I totally support this! This is the one thing this library is missing.

AdamGaskins commented 9 years ago

I ended up modifying fuse.js to get it to work the way I needed, although I'm using a custom searcher. I'm super busy, but maybe I'll fork and recreate the changes if I get a spare moment.

Basically what I did in the searcher is this:

    return {
        isMatch: true,
        score: totalScore,
        highlight: calculatedHighlightedText,
    };

Then I added an additional option to Fuse:

        search = new Fuse(books, {
            keys: k,
            searchFn: values,
            include: ['score', 'highlight']
        });

Very simple to implement, and I was only digging in the code for a few minutes. Then of course the main task would be to implement the highlighting efficiently in BitapSearcher, which I didn't really look at (since I'm using a custom searcher).

Hope that helps someone! If not, like I said, I may come along and implement this myself at some point.

mapsam commented 8 years ago

+1 thanks!

marcoala commented 8 years ago

Any news?

marcoala commented 8 years ago

I digged a little bit into the code and I've something working, still need some test and better name for variables, but the basic idea is to have a charmask (not sure if is a thing) that represent the match, so:

https://github.com/marcoala/Fuse/commit/af05432f357e2bde71e8df835b9716dbb9428a1e

Search: rown dan brown --> '000001111' rob grant --> '11000010'

0 for not match, and 1 for match. Maybe the final can be represented with something in the form of [index, length] as pointed before, but I didn't figure out a way to do it without passing for the mask first. So @kriskwhat do you think? If you like the approach I can proceed with writing test and submitting an actual pull request.

krisk commented 8 years ago

@marcoala, thanks for looking into this :+1:

Instead of a character mask, I've made some changes to the codebase to use a match mask array, which simplifies the computation, and returns the matched indices:

var items = ['FH Mannheim', 'University Mannheim']
var fuse = new Fuse(items, {
  include: ['score', 'matchedIndices']
})
var result = fuse.search('Uni Mannheim')

// output
[{
  item: 1,
  matchedIndices: [ [ 0, 2 ], [ 4, 4 ], [ 7, 7 ], [ 10, 18 ] ],
  score: 0.32
}, {
  item: 0,
  matchedIndices: [ [ 1, 10 ] ],
  score: 0.25
}]

The matched indices are simply of the for [start, end], One can then use the indices as follow:

[ [ 0, 2 ], [ 4, 4 ], [ 7, 7 ], [ 10, 18 ] ] --> University Mannheim [ [ 1, 10 ] ] --> FH Mannheim

Another approach could be to use [start, length]. I'm unsure as to which one is preferable. If anyone is particularly enamored with one approach, speak now :)

I've published a beta version. Thoughts?

Also to consider: the matched indices would be useless if there's no way to tell for which key did the searched object and indices belong. For example:

var items = [{
  name: 'FH Mannheim',
  title: 'a Uni'
}, {
  name: 'University Mannheim',
  title: 'b Uni'
}]

var fuse = new Fuse(items, {
  keys: ['name', 'title'],
  include: ['score', 'matchedIndices']
})

Working on this right now...

krisk commented 8 years ago

Ok, I'm thinking this instead:

Input:

var items = [{
  name: 'FH Mannheim',
  title: 's Uni'
}, {
  name: 'University Mannheim',
  title: 'd Uni'
}]
var fuse = new Fuse(items, {
  verbose: true,
  keys: ['name', 'title'],
  include: ['score', 'matches']
})
var result = fuse.search('Uni')

Output:

[{
  item: {
    name: 'University Mannheim',
    title: 'd Uni',
    author: {
      more: 'un'
    }
  },
  matches: [{
    key: 'name',
    indices: [[0, 2]]
  }, {
    key: 'title',
    indices: [[2, 4]]
  }],
  score: 0.11811111111111111
}, {
  item: {
    name: 'FH Mannheim',
    title: 's Uni',
    author: {
      more: 'un'
    }
  },
  matches: [{
    key: 'title',
    indices: [[2, 4]]
  }],
  score: 0.17666666666666667
}]

This way you'll know to which keys the matched indices belong.

janraasch commented 8 years ago

Sounds great. Is that part of 2.2.0-beta?

krisk commented 8 years ago

@janraasch yes, I just updated it. Have not published to npm yet. Nonetheless, try it out. Let me know what you think. I still need to create a list of tests against it.

janraasch commented 8 years ago

Alright, cool. I'll check it out.

krisk commented 8 years ago

For reference, once you have the matched indices, you can then do a simple string builder to format the text according to those indices:

var text = 'hello world' // assume this is the text 
var result = []
var matches = [[1, 3], [6, 8]] // assume these are the matched indices
var pair = matches.shift()
// Build the formatted string
for (var i = 0; i < text.length; i++) {
  var char = text.charAt(i)
  if (pair && i == pair[0]) {
    result.push('<b>')
  }
  result.push(char)
  if (pair && i == pair[1]) {
    result.push('</b>')
    pair = matches.shift()
  }
}
console.log(result.join(''))
// h<b>ell</b>o <b>wor</b>ld
janraasch commented 8 years ago

Works great.

krisk commented 8 years ago

@janraasch glad to hear. I'll run through some tests, and perhaps intermittently there will be some feedback from this thread. If all goes well, we can expect version 2.2.0 before the end of the week.

janraasch commented 8 years ago

I'd say, we wait for 29th, then this thread will be 3 years old. :dart: :dancers:

:wink: (I am kidding of course, I am actually really glad, that this library is still under development and even older threads like this are alive and kickin!)

marcoala commented 8 years ago

@krisk I like your of approach, mine was just a rude attempt to hack a solution, I just spent an hour in an unknown code.

As final user the couples [start,end] give me the opportunity to do nice things like "highlight only group of at least 3 characters" or "highlight at most 3 group of match", to avoid things like 'example string'.

EDIT: can't test this weekend, I'll try to give you some feedback next week.

krisk commented 8 years ago

@janraasch yes, 3 years later :smile:. Mea culpa. It took @marcoala to kickstart it, so many thanks to him!

Abhinav1217 commented 8 years ago

Wow. Works great. so far. I did get some unwanted results. Will post some results after I find some definite pattern.

krisk commented 8 years ago

@Abhinav1217 any word on the unwanted results? Thanks! :smile:

krisk commented 8 years ago

Also, does anyone want to volunteer to create more test cases, and adding documentation to the README? I'd be forever indebted :sweat_smile:

farzher commented 8 years ago

Sometimes pair[0] is greater than pair[1], which messes up the loop that inserts wrapper html.

Idk too many details about this, I just wanted to make you aware that this happens sometimes

Pysis868 commented 6 years ago

Saw krisk's sample highlighting method. It was useful as a reference! I made a slightly different one that uses substring functions instead of going by each character:

// Does not account for overlapping highlighted regions, if that exists at all O_o..
function generateHighlightedText(text, regions) {
  if(!regions) return text;

  var content = '', nextUnhighlightedRegionStartingIndex = 0;

  regions.forEach(function(region) {
    content += '' +
      text.substring(nextUnhighlightedRegionStartingIndex, region[0]) +
      '<span class="highlight">' +
        text.substring(region[0], region[1]) +
      '</span>' +
    '';
    nextUnhighlightedRegionStartingIndex = region[1];
  });

  content += text.substring(nextUnhighlightedRegionStartingIndex);

  return content;
};

Added a comment thinking it could be more complex, but not sure if anybody needs overlapping highlights yet. Not sure how to implement it that way, but a different matches data structure would probably be needed, like having a sort of tree array.

Also, I know Krisk mentioned this is not really a feature for the core library, but as I was making some other methods, maybe some formatters inserted to be run at certain event/hook points could be in order. Maybe that could be how one would implement a plugin system for the library anyway, but even generally anyone could use that feature immediately if they wanted too.

evenfrost commented 5 years ago

Thanks for the helper @Pysis868, I took the liberty of modifying it a little bit and made a function to format the fuse.search output right away. It returns input objects with modified highlighted values (using TypeScript):

const highlight = (fuseSearchResult: any, highlightClassName: string = 'highlight') => {
  const set = (obj: object, path: string, value: any) => {
      const pathValue = path.split('.');
      let i;

      for (i = 0; i < pathValue.length - 1; i++) {
        obj = obj[pathValue[i]];
      }

      obj[pathValue[i]] = value;
  };

  const generateHighlightedText = (inputText: string, regions: number[] = []) => {
    let content = '';
    let nextUnhighlightedRegionStartingIndex = 0;

    regions.forEach(region => {
      const lastRegionNextIndex = region[1] + 1;

      content += [
        inputText.substring(nextUnhighlightedRegionStartingIndex, region[0]),
        `<span class="${highlightClassName}">`,
        inputText.substring(region[0], lastRegionNextIndex),
        '</span>',
      ].join('');

      nextUnhighlightedRegionStartingIndex = lastRegionNextIndex;
    });

    content += inputText.substring(nextUnhighlightedRegionStartingIndex);

    return content;
  };

  return fuseSearchResult
    .filter(({ matches }: any) => matches && matches.length)
    .map(({ item, matches }: any) => {
      const highlightedItem = { ...item };

      matches.forEach((match: any) => {
        _.set(highlightedItem, match.key, generateHighlightedText(match.value, match.indices));
      });

      return highlightedItem;
    });
};

// usage:

const res = highlight(fuse.search(text)); // array of items with highlighted fields

Also, here is the gist.

titanism commented 7 months ago

https://github.com/tricinel/highlight-words