Indexing performance drop

fulmicoton commented 4 months ago

We apparently have a performance regression between

as spotted by @fmassot

fulmicoton commented 4 months ago

Tantivy went from -tantivy = { git = "https://github.com/quickwit-oss/tantivy/", rev = "6181c1e", default-features = false, features = [ +tantivy = { git = "https://github.com/quickwit-oss/tantivy/", rev = "92b5526", default-features = false, features = [

RaphaelMarinier commented 4 months ago

Note that the CPU time did not change: Screenshot from 2024-05-24 10-58-05 So the regression is probably due to some reduction in computation parallelism, and not the computations themselves taking more CPU time.

fulmicoton commented 4 months ago

Not necessarily tantivy then

PSeitz commented 4 months ago

The drop is caused by this PR: Fix the ingest rate displayed in the CLI https://github.com/quickwit-oss/quickwit/pull/4682

https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=1573,1574&search_metric=engine_duration (the name for 894188f19 should be before_fix_ingest_rate)

fulmicoton commented 4 months ago

:) I didn't see that coming :D

fulmicoton commented 4 months ago

@PSeitz This PR seems very safe BUT... it actually DOES something on the server side of ingest v1.

It introduced a change (hopefully a bugfix) in the code of our rate estimator. The rate estimator itself is used in a strange RateModulator.

The ideas is this: Quickwit need one way or another to have some backpressure mechanism. It was judged at the time that sending back 419 status code could be a problem from clients.

To avoid it (it does not really avoid it but well), when quickwit sees the queue memory getting close to its limit, it will smoothly time::sleep on the server side before returning a 200. If the memory limit is reached we do return 419 however.

We can try and remove this logic, and see if it works. (the quickwit client, retries after 500ms upon receiving a 419 anyway). If this fixes the bug, then we can dig deeper and see if:

we still have a bug on the rate estimator
if this is a side effect of the rate modulator kicking in more agressively when it has the right rate figures
we still want this rate modulator or if we want to fix the code etc.

Can you rerun the bench without the rate modulation layer? You will find it in the start_ingest_client_if_needed function.

PSeitz commented 4 months ago

Disabling the rate modulator fixes the performance issue: https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=1573,1581&search_metric=engine_duration

The first call to get the memory use in the rate modulator is returning 5 (5x the max capacity), which then causes the ingestion to be slower. I tested to set memory_usage_ratio to a fixed 0.1, but it's still slightly slower than before. https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=1573,1581,1584&search_metric=engine_duration

[quickwit-serve/src/rate_modulator.rs:63:9] memory_usage_ratio = 5.448237061500549e-8
[quickwit-serve/src/rate_modulator.rs:63:9] memory_usage_ratio = 0.0023282133042812347
[quickwit-serve/src/rate_modulator.rs:63:9] memory_usage_ratio = 0.0023282133042812347
[quickwit-serve/src/rate_modulator.rs:63:9] memory_usage_ratio = 0.0023282133042812347
[quickwit-serve/src/rate_modulator.rs:63:9] memory_usage_ratio = 0.0023282133042812347
[quickwit-serve/src/rate_modulator.rs:63:9] memory_usage_ratio = 0.004656488075852394

quickwit-oss / quickwit

Indexing performance drop #5028