krisk / Fuse

Lightweight fuzzy-search, in JavaScript
https://fusejs.io/
Apache License 2.0
18.16k stars 767 forks source link

Performance issues #282

Closed KonradHoeffner closed 4 years ago

KonradHoeffner commented 5 years ago

A fuse search can take more than a second on a i7-4790 CPU @ 3.60GHz on Firefox Developer Edition 66.0b4 (64-bit) on Linux with an index of 4240 entries for a search query like "project manager". Is that normal? Is there a way to make it faster? Does Fuse calculate the score between each token and each index item (linear complexity) or does it use some sublinear lookup mechanism like a Levenshtein automaton?

My options are:

const options =
{
  shouldSort: true,
  tokenize: true,
  threshold: 0.25,
  maxPatternLength: 40,
  minMatchCharLength: 3,
  matchAllTokens: true,
  location: 0,
  distance: 100,
  id: "uri",
  keys:
  [
    {name:"l", weight: 0.7},
    {name:"def", weight: 0.3},
  ],
};

The index items look like this:


{
        "uri": "http://www.snik.eu/ontology/he/WechselderHardwaretechnologie",
        "l": [
                "Wechsel der Hardwaretechnologie",
                "WechselderHardwaretechnologie"
        ]
},{
        "uri": "http://www.snik.eu/ontology/he/Bedarfspotenzial",
        "l": [
                "Bedarfspotenzial",
                "Bedarfspotenzial"
        ],
        "def": "Ist kein Bedarfspotenzial vorhanden\nund/oder wird von den Bedarfsträgern kein Bedarf artikuliert kommt es entweder zu\nkeinem Projekt oder die Projektplanung wird vom Lenkungsausschuss verworfen weil sie\neiner projektbezogenen Wirtschaftlichkeitsanalyse nicht standhalten kann."
},{
        "uri": "http://www.snik.eu/ontology/bb/Ccow",
        "l": [
                "CCOW",
                "Ccow",
                "Clinical Context Object Workgroup"
        ],
        "def": "Develops standards for the synchronization of independent application components running on one workstation, in order to support contextual integration."
},
Sigmus commented 5 years ago

Is other stuff happening while you search? Like elements being rendered in the DOM, etc.

krisk commented 5 years ago

This would be very hard to troubleshoot. Some things that come to mind:

1) Are you searching as the word is typed? Or is it properly debounced? 2) As @Sigmus said, are there other operations happening? 3) 4,240 could very well be a lot of items

baer commented 5 years ago

I am searching through a set of ~14,000 complex records with the relatively aggressive config (see below for config and record format).

Given that, the performance bottleneck is unlikely to be Fuse.js. I think @krisk is right - there very well may be other things happening in your application. Have you checked your flame chart?

Fuse Config

const FILTERABLE_FIELDS = [
  `id`,
  `addresses.city`,
  `addresses.state`,
  `addresses.zip`,
  `branch.description`,
  `contact`,
  `name`,
]

const index = new Fuse(collection, {
  id: `id`,
  keys: FILTERABLE_FIELDS,
  includeScore: true,
  tokenize: true,
  maxPatternLength: 32,
  minMatchCharLength: 1
});

Object format

{
  "id":"<Customer ID>",
  "addresses":[{
    "addressLine1":"<Customer Street Address>",
    "addressLine2":"",
    "city":"Morrilton",
    "county":"Conway",
    "state":"AR",
    "zip":"72110"
  }],
  "branch":{
    "description":"Atkins"
  },
  "contact":"Jane Doe",
  "name":"My Favorite Auction House"
}
heyimalex commented 5 years ago

@baer It's hard to say. The author's def field contains a bunch of words, which increases time considerably. When tokenization is true, a good cost function is "tokens per element". Your example has 11, one of theirs has 32.

KonradHoeffner commented 5 years ago

Thanks for the information! The definitions can be very long indeed, I will consider limiting their length if it becomes a problem.

jbach commented 5 years ago

I’m observing the same problem in Firefox 66.0.2 (64-bit)/Mac: https://codesandbox.io/s/q3p1m2v1j4 Chrome is fast, Safari slower, Firefox really slow.

heyimalex commented 5 years ago

@jbach The performance difference is wild. I use firefox developer edition and have always thought fuse was kinda slow :') Chalk it up to differences in the engines I guess?

sidvishnoi commented 4 years ago

It's consistently slow in Firefox, while working fine with Chrome (even with 4x CPU throttling). I'm getting a ~100ms response with ~7000 items (an array of strings) ( ({ caseSensitive: true }, rest defaults) in Firefox on Linux. image

I'll try to debug. Can't find any source maps, so will try to create my own local bundle.

Update: Unfortunately, neither Chrome or Firefox debugger let me reach further inside bitap_search.js

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

sunspots commented 4 years ago

I see a big difference between Firefox and Chromium for decently small indexes, but it gets progressively worse for longer search input (10 char search text can take 120-150ms on Firefox while the same takes ~13ms in Chromium).

I have been unable to pinpoint any specific issue, but from the looks of it, pretty much all of the processing time is spent inside this loop

There does seem to be a significant amount of time spent on garbage collection (20% of time spent in the loop), so there could be some issue with memory allocation/use/cleanup.

All I can see points to differences in engine optimization. Could be GC/memory but it may well be some other optimizations that makes V8 really fast.

Edit: I think GC was triggered by my own modifications to the code (in an attempt to identify the correct section).

Following image is from Firefox profiler on Fuse v5.2.3 (bundled but otherwise untouched) search performance