Open gkamat opened 1 year ago
Allowing some time for the merge operation to complete, a manual request via curl
returns instantaneously:
{"_shards":{"total":120,"successful":120,"failed":0}}
but OSB remains stuck at:
2023-06-05 23:03:17,281 ActorAddr-(T|:46511)/PID:213463 osbenchmark.worker_coordinator.worker_coordinator INFO Task assertions enabled: False
2023-06-05 23:03:17,281 ActorAddr-(T|:46511)/PID:213463 osbenchmark.worker_coordinator.worker_coordinator INFO Choosing [unthrottled] for [force-merge-1-seg].
2023-06-05 23:03:17,282 ActorAddr-(T|:46511)/PID:213463 osbenchmark.worker_coordinator.worker_coordinator INFO Creating iteration-count based schedule with [None] distribution for [force-merge-1-seg] with [0] warmup iterations and [1] iterations.
2023-06-05 23:03:17,282 ActorAddr-(T|:46511)/PID:213463 osbenchmark.worker_coordinator.worker_coordinator INFO iteration-count-based schedule will determine when the schedule for [force-merge-1-seg] terminates.
The issue appears to be on the OpenSearch side. Issuing a long-running force-merge via urllib3
outside of OSB sometimes never returns from the call. For now, a workaround by making a change to the http_logs
workload (locally) might suffice.
@gkamat @IanHoang Do we need to open an issue in OpenSearch core to investigate this behavior? CC: @dblock @nknize
Describe the bug With long-running requests like merges, OSB sometimes hangs although the operation has completed.
To Reproduce This is intermittent, but running OSB with the
http_logs
workload occasionally hangs on theforce-merge-1-seg
request. The operation eventually completes, but much after the cluster has completed the task. this can be verified by re-running the request manually, while OSB is waiting for its completion.Expected behavior No "hanging" or "stuck" behavior, and accurate reporting on the disposition of the request.
More Context (please complete the following information): OSB 1.0 against OSB 2.3+