nextapps-de / flexsearch

Next-Generation full text search library for Browser and Node.js
Apache License 2.0
12.53k stars 491 forks source link

2000x Slower query when searching with space #239

Closed yhyh0 closed 3 years ago

yhyh0 commented 3 years ago

Thanks for the library, it has been a great help.

It is just when I search with space like 'ab cd', it runs like only about 50 queries/second. But when without, it can go as fast as 150k queries/second.

I have tried adjusting threshold, resolution, and limit, none of them helped much. Setting 'tokenize' to 'strict' helps, but that is not an option for me, as my dataset has about 1 million items and most of them are just one word. I am doing a 'reverse' for now. I am also using worker, async, and cache.

Any suggestion that I can make it faster? It takes about 20g ram and 12% cpu right now, I am willing to give it as much as 90g ram. I'm using v0.7.2. Thanks.

ts-thomas commented 3 years ago

Please try to disable worker. That is the problem, you cannot measure asynchronous distributed processing, because it is delayed by the event loop. That's the reason why 1 word is so much faster than searching 2 words in your query. Just do not use Worker.

yhyh0 commented 3 years ago

Just tried that, but it's the same.. With whitespace, less than 100q/s; without, more than 100,000q/s. Yes for measuring the time, I did wait for all the promises/results and then calculated an average on a few batches. Here is how I set up the index:

const index = new Document({
  charset: "latin:simple",
  tokenize: "reverse",
  cache: true,
  async: true,
  worker: false,
  threshold: 0,
  resolution: 4,

  document: {
    id: 'id',
    field: ["w", "norm1"],
    store: ["w", "l", "norm1"]
  }
})

Shouldn't having a whitespace only make it 5-10 times slower?

ts-thomas commented 3 years ago

For measuring please also use async: false, after measuring just enable async again or just left it disabled.