Open kennsippell opened 2 weeks ago
@kennsippell - are all three moves on the Nairobi instance (nairobi-echis.health.go.ke
)?
Yes. But there are 32 jobs which are delayed right now due to sentinel backlog and those span many instances (Busia, Turkana, etc)
Hi @kennsippell, after a quick investigation into the differents jobs referenced here, here’s what I found by reviewing the job’s execution and inspecting each job logs directly on the jobs board:
Below are excerpts from the logs indicating multiple postponements due to backlog thresholds not being met (first job) until the 3 of November:
[2024-11-01T13:13:21.440Z]: Job ### postponed until 5:13 PM. Reason was sentinel backlog.
[2024-11-01T17:13:22.265Z]: Job ### postponed until 9:13 PM. Reason was sentinel backlog.
...
[2024-11-03T05:13:28.461Z]: Job ### postponed until 9:13 AM. Reason was sentinel backlog.
It seems that the partner may have been unaware of the final timing of these contact moves due to the delays (preventing the execution), leading to the assumption that the contacts were not moved. However, once the backlog was cleared, the system processed the moves, albeit at a later time.
cc: @mrjones-plip
ic. Seems the logs are truncated. Do you know a way to see the full log for a job?
Yes i truncated it myself, but on the board you can see the logs of each job, in the Logs
tab.
One other way I use to grab the logs
kubectl --context arn:aws:eks:eu-west-2:720541322708:cluster/prod-cht-eks \
--namespace users-chis-prod logs deploy/users-chis-ke-cht-user-management-worker \
--since 8h
Oh. Is the output from cht-conf not included in the job's logs? Should it be? Perhaps that was my source of confusion.
Oh no, its not included in the job own logs.
Should it be
I think so,
Perhaps that was my source of confusion.
This logs are currently available only from the worker logs
https://users-chis-ke.app.medicmobile.org/board/queue/MOVE_CONTACT_QUEUE/8414b9ba-e3b4-4d00-ad74-822d57452365?status=completed
Did this execute? Logs seem to just indicate that it was delayed because of sentinel backlog for 2 days and maybe nothing ever happened?
Nairobi backlog has been kinda wild for many days, so we probably did well not to execute the job
Why is it in a completed state? Is 2 days the right max limit (seems like no)?