instructlab / instructlab-bot

GitHub bot to assist with the taxonomy contribution workflow
Apache License 2.0
14 stars 16 forks source link

Bot need to be resilient if the worker running the jobs goes away #241

Open vishnoianil opened 4 months ago

vishnoianil commented 4 months ago

User trigger the precheck/generate job and the worker picks up the job and while it's executing the job, for any reason worker disconnect or goes down, we need to 1) either track that the worker is gone and resubmit the job to other worker and update the PR 2) or timeout the jobs and report the error to user to resubmit the job again. 3) Or do something more smart.