Open fulmicoton opened 4 months ago
Tantivy went from -tantivy = { git = "https://github.com/quickwit-oss/tantivy/", rev = "6181c1e", default-features = false, features = [ +tantivy = { git = "https://github.com/quickwit-oss/tantivy/", rev = "92b5526", default-features = false, features = [
Note that the CPU time did not change: So the regression is probably due to some reduction in computation parallelism, and not the computations themselves taking more CPU time.
Not necessarily tantivy then
The drop is caused by this PR: Fix the ingest rate displayed in the CLI https://github.com/quickwit-oss/quickwit/pull/4682
https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=1573,1574&search_metric=engine_duration
(the name for 894188f19
should be before_fix_ingest_rate
)
:) I didn't see that coming :D
@PSeitz This PR seems very safe BUT... it actually DOES something on the server side of ingest v1.
It introduced a change (hopefully a bugfix) in the code of our rate estimator.
The rate estimator itself is used in a strange RateModulator
.
The ideas is this: Quickwit need one way or another to have some backpressure mechanism. It was judged at the time that sending back 419 status code could be a problem from clients.
To avoid it (it does not really avoid it but well), when quickwit sees the queue memory getting close to its limit, it will smoothly time::sleep
on the server side before returning a 200
.
If the memory limit is reached we do return 419 however.
We can try and remove this logic, and see if it works. (the quickwit client, retries after 500ms upon receiving a 419 anyway). If this fixes the bug, then we can dig deeper and see if:
Can you rerun the bench without the rate modulation layer?
You will find it in the start_ingest_client_if_needed
function.
Disabling the rate modulator fixes the performance issue: https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=1573,1581&search_metric=engine_duration
The first call to get the memory use in the rate modulator is returning 5 (5x the max capacity), which then causes the ingestion to be slower. I tested to set memory_usage_ratio
to a fixed 0.1
, but it's still slightly slower than before.
https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=1573,1581,1584&search_metric=engine_duration
[quickwit-serve/src/rate_modulator.rs:63:9] memory_usage_ratio = 5.448237061500549e-8
[quickwit-serve/src/rate_modulator.rs:63:9] memory_usage_ratio = 0.0023282133042812347
[quickwit-serve/src/rate_modulator.rs:63:9] memory_usage_ratio = 0.0023282133042812347
[quickwit-serve/src/rate_modulator.rs:63:9] memory_usage_ratio = 0.0023282133042812347
[quickwit-serve/src/rate_modulator.rs:63:9] memory_usage_ratio = 0.0023282133042812347
[quickwit-serve/src/rate_modulator.rs:63:9] memory_usage_ratio = 0.004656488075852394
We apparently have a performance regression between
as spotted by @fmassot