opensearch-project / opensearch-benchmark

OpenSearch Benchmark - a community driven, open source project to run performance tests for OpenSearch
https://opensearch.org/docs/latest/benchmark/
Apache License 2.0
108 stars 75 forks source link

[BUG] OSB hangs when runs carried out with large number of clients and target-throughput enabled #633

Open gkamat opened 3 weeks ago

gkamat commented 3 weeks ago

Describe the bug

For tests where the target throughput is > 4000 ops/s and search_clients > 4000, OSB stalls and does not report results. This only happens if target throughput is set. Based on the OSB logs, some workers never meet at the joinpoint and this is preventing OSB from reporting results.

On the target cluster, the run appears to have completed in the expected time and the target throughput is achieved.

The cause is not clear but it seems to be related to the target-throughput pathway and also in how Thespianpy workers are missing (the suspicion is that they exit prematurely and cause OSB to stall.)

This may be related to the issue reported in #318.