Closed yhyh0 closed 3 years ago
Please try to disable worker. That is the problem, you cannot measure asynchronous distributed processing, because it is delayed by the event loop. That's the reason why 1 word is so much faster than searching 2 words in your query. Just do not use Worker.
Just tried that, but it's the same.. With whitespace, less than 100q/s; without, more than 100,000q/s. Yes for measuring the time, I did wait for all the promises/results and then calculated an average on a few batches. Here is how I set up the index:
const index = new Document({
charset: "latin:simple",
tokenize: "reverse",
cache: true,
async: true,
worker: false,
threshold: 0,
resolution: 4,
document: {
id: 'id',
field: ["w", "norm1"],
store: ["w", "l", "norm1"]
}
})
Shouldn't having a whitespace only make it 5-10 times slower?
For measuring please also use async: false
, after measuring just enable async again or just left it disabled.
Thanks for the library, it has been a great help.
It is just when I search with space like 'ab cd', it runs like only about 50 queries/second. But when without, it can go as fast as 150k queries/second.
I have tried adjusting threshold, resolution, and limit, none of them helped much. Setting 'tokenize' to 'strict' helps, but that is not an option for me, as my dataset has about 1 million items and most of them are just one word. I am doing a 'reverse' for now. I am also using worker, async, and cache.
Any suggestion that I can make it faster? It takes about 20g ram and 12% cpu right now, I am willing to give it as much as 90g ram. I'm using v0.7.2. Thanks.