medic / cht-user-management

GNU Affero General Public License v3.0
4 stars 1 forks source link

Investigate whether this move-contact job ever executed #220

Open kennsippell opened 2 weeks ago

kennsippell commented 2 weeks ago

https://users-chis-ke.app.medicmobile.org/board/queue/MOVE_CONTACT_QUEUE/8414b9ba-e3b4-4d00-ad74-822d57452365?status=completed

Did this execute? Logs seem to just indicate that it was delayed because of sentinel backlog for 2 days and maybe nothing ever happened?

Nairobi backlog has been kinda wild for many days, so we probably did well not to execute the job Image

Why is it in a completed state? Is 2 days the right max limit (seems like no)?

kennsippell commented 2 weeks ago

A few more:

mrjones-plip commented 2 weeks ago

@kennsippell - are all three moves on the Nairobi instance (nairobi-echis.health.go.ke)?

kennsippell commented 2 weeks ago

Yes. But there are 32 jobs which are delayed right now due to sentinel backlog and those span many instances (Busia, Turkana, etc)

paulpascal commented 2 weeks ago

Hi @kennsippell, after a quick investigation into the differents jobs referenced here, here’s what I found by reviewing the job’s execution and inspecting each job logs directly on the jobs board:

1. Job Timing and Delays:

2. Logs and Delay Details:

Below are excerpts from the logs indicating multiple postponements due to backlog thresholds not being met (first job) until the 3 of November:

[2024-11-01T13:13:21.440Z]: Job ### postponed until 5:13 PM.  Reason was sentinel backlog.
[2024-11-01T17:13:22.265Z]: Job ### postponed until 9:13 PM.  Reason was sentinel backlog.
...
[2024-11-03T05:13:28.461Z]: Job ### postponed until 9:13 AM.  Reason was sentinel backlog.

3. Job Completion Status:

It seems that the partner may have been unaware of the final timing of these contact moves due to the delays (preventing the execution), leading to the assumption that the contacts were not moved. However, once the backlog was cleared, the system processed the moves, albeit at a later time.

cc: @mrjones-plip

kennsippell commented 2 weeks ago

ic. Seems the logs are truncated. Do you know a way to see the full log for a job?

paulpascal commented 2 weeks ago

Yes i truncated it myself, but on the board you can see the logs of each job, in the Logs tab.

Image

paulpascal commented 2 weeks ago

One other way I use to grab the logs

kubectl --context arn:aws:eks:eu-west-2:720541322708:cluster/prod-cht-eks  \
    --namespace users-chis-prod logs deploy/users-chis-ke-cht-user-management-worker \
    --since 8h
kennsippell commented 2 weeks ago

Oh. Is the output from cht-conf not included in the job's logs? Should it be? Perhaps that was my source of confusion.

paulpascal commented 2 weeks ago

Oh no, its not included in the job own logs.

Should it be

I think so,

Perhaps that was my source of confusion.

This logs are currently available only from the worker logs