hercules-ci / support

User feedback, questions and our public roadmap. help@hercules-ci.com
5 stars 1 forks source link

Jobs do not seem to restart after forceful sever reset #19

Open offlinehacker opened 5 years ago

offlinehacker commented 5 years ago

I had job parallelism set too high and my server became unresponsive, after server reset and redeployed with lowered parallelism, jobs do not seem to restart. If i click retry nothing happens and response is 404.

offlinehacker commented 5 years ago

It now seems if i re-trigger build with a new commit, failed and stalled jobs are shared with previous build, and now CI is stuck. Take a look here: https://hercules-ci.com/github/xtruder/kubenix/jobs/8 and here: https://hercules-ci.com/github/xtruder/kubenix/jobs/9

roberth commented 5 years ago

We've had to delay features that would recover this. I've manually reset your tasks.

domenkozar commented 5 years ago

This is an annoying one. We don't yet have "agent pings" that would allow us to see agent liveliness.

We're going to add the "Cancel" button to be able to manually recover, but the automatic fix with agent liveliness is scheduled for next sprint.

offlinehacker commented 5 years ago

It would be also nice to see logs after job is canceled

domenkozar commented 5 years ago

There won't be logs, because if job didn't report build finished event and it's cancelled there won't be any logs to show. This will change once streaming of logs #17 is implemented.

Note that "cancel" workaround is planned for sprint #4 so expect a fix soon. We had to postpone agent liveliness for another sprint or two.

domenkozar commented 5 years ago

We do reschedule if agent shuts down, but we don't yet handle the case of forceful shutdown - those will have some kind of a timeout.