Turns out the job was just taking a long time (2 hours to run)
actions:
@vishnoianil is adding one.
Add proof of life logging during the job. It's hard to tell if the worker is just waiting on the API call results or dead. We haven't seen anything hang yet but for this scenario it would have been helpful.
Should probably time jobs out at an hour on our side?
We probably need to add a hard timeout on the groutine. Logs: