Closed djklim87 closed 1 week ago
After debugging, I have discovered the issue and implemented a fix: https://github.com/manticoresoftware/buddy-core/pull/79
In short:
Previously, we used the wait
method, which actually returns the exit code of the worker. We assumed that workers should stop with 0
, indicating no error. However, for some reason, Kafka workers sometimes finish with error code 1. This caused issues because we had an isRunning
check before sending the stop signal.
What we should consider next:
I create task about fixes, but this one we are free to close if fix works fine.
Here is the task: https://github.com/manticoresoftware/manticoresearch-buddy/issues/381
Bug Description:
When we run the process and after trying to stop it, the first time it performs successfully. After the recreation, it runs successfully and performs its job. But after we call
stopProcessById
, this command seems to not execute. I see the record in logs[process] execute: stopWorkerById ["kafka_alter_0"]
but worker still worksBasically, you can see this in the Kafka integration
Run the searchd with filtering logs only from the worker
Create environment
This commands already started worker, so we'll see in the logs some records about it
After let's stop it
Here we see an important record from the worker that it stops consuming
Worker: End consuming
Recreate it
And finally, stop it
In logs, we don't see the record of consumption being stopped. So this is our bug
Manticore Search Version:
Manticore 6.3.7 2484d6519@24092610 dev (columnar 2.3.1 f9ef8b9@24090411) (secondary 2.3.1 f9ef8b9@24090411) (knn 2.3.1 f9ef8b9@24090411)
Operating System Version:
docker
Have you tried the latest development version?
Yes
Internal Checklist:
To be completed by the assignee. Check off tasks that have been completed or are not applicable.