krisk / Fuse

Lightweight fuzzy-search, in JavaScript
https://fusejs.io/
Apache License 2.0
18.2k stars 768 forks source link

Reusing previously generated index file #436

Closed tesla-cat closed 4 years ago

tesla-cat commented 4 years ago

Description

I intend to use Fuse with Google firebase

in the doc:

const myIndex = Fuse.createIndex(options.keys, books) const myFuse = new Fuse(books, options, myIndex)

we need to recollect all books and regenerate the index whenever a new book is added.

Describe the solution you'd like

is it possible to:

and in this line const myFuse = new Fuse(books, options, myIndex) why do we need all books as an argument? why not use some information like an ID that is already accessible from myIndex ? as how they do it in Lunr.js

Thank you !

krisk commented 4 years ago

Feature already exists:

Whenever you add/remove an item via the functions add or removeAt, it will automatically also update the index. You can always get the newly generated index file via fuse.getIndex().

tesla-cat commented 4 years ago

Feature already exists:

Whenever you add/remove an item via the functions add or removeAt, it will automatically also update the index. You can always get the newly generated index file via fuse.getIndex().

interesting ! you do realize there is no add or removeAt in the documentation right ? 😂

tesla-cat commented 4 years ago

image

and also flexsearch.js gave this table, any rebut from fuse.js ?

krisk commented 4 years ago

Would need to dig in on how the comparison is made. They’re also using an older version of fuse, which has since gone through several performance improvements.

krisk commented 4 years ago

@tesla-cat, I took a look at the performance comparison they're running.

I'm not seeing a fuzzy-search (with actual typos) performance check in there (does the library support it? see https://github.com/nextapps-de/flexsearch/issues/118). This performance test is always against an exact match against the following queries:

var text_queries = "gulliver;great;country;time;people;little;master;took;feet;houyhnhnms".split(";");

Notably, it seems like flexsearch.js is pre-generating a dictionary of all the words in the list, with the key being the word, and value the location where it appears. For exact-string search, this will always be an O(1) operation (i.e, map[<exact_word>]), and thus always faster than what Fuse.js does, which is fuzzy-matching.

In the test, as soon as I introduce a typo, for example:

var text_queries = "guliver" // one L

flexsearch.js returns 0 results, and still shows 500k+ op/s, while Fuse.js returns actual results. So, from the looks of it, I'm not sure whether this is an adequate comparison to make.

tesla-cat commented 4 years ago

@krisk

hey thank you for teaching me this kind of fun stuff, that table looked absurd to me at the very beginning, 1000 times better than everyone else? that must be either turing's work or an ignorant joke.

great work from you !

tesla-cat commented 4 years ago

@krisk

and also by the way, you have the coolest github profile photo i have seen so far !

exogenesys commented 4 years ago

@krisk @tesla-cat

Can't I search with just the index I've previously created? Is it necessary to get all the books every-time I'm trying to use an old index? It seems odd and wasteful.