thin-edge / thin-edge.io

The open edge framework for lightweight IoT devices
https://thin-edge.io
Apache License 2.0
219 stars 54 forks source link

ci build errors due to cancelled jobs #2908

Closed reubenmiller closed 3 months ago

reubenmiller commented 4 months ago

Describe the bug

Recent failures on the merge queue show unexpected behaviour when a build job is cancelled. This resulted in the workflow continuing even though one architecture failed resulting in only half the packages being published.

Runs with unexpected behaviour

To Reproduce

Expected behavior

Screenshots

Environment (please complete the following information):

Additional context

reubenmiller commented 4 months ago

A PR was merged to add an additional dependency to the build job, and the subsequent behaviour will be monitored.

reubenmiller commented 4 months ago

There is an active discussion going on for similar symptoms to:

The hosted runner encountered an error while running your job. (Error Type: Failure).

The above was taken from this run:

image
reubenmiller commented 4 months ago

Root cause

There is an active Github Issue where other projects have also reported similar issues with Github Workflows.

There seems to be a problem with the Github Runner which results in jobs sporadically being set to skipped, due to unknown reasons. The error can manifest in slightly different ways, but the following are the symptoms seen in the thin-edge.io project:

Secondary effects

The root cause resulted in an unexpected side-effect where the publish job was still running after the check-build was failing was due to the usage of always() in the job's if block. The usage of always() is required as the upstream test job is conditional, and without always(), the job was also be skipped.

To handle the skip case better, the explicit check for the success result on the check-build job was added.

publish:
  name: Publish ${{ matrix.job.target }}
  if: |
    always() &&
    github.event_name != 'pull_request_target' &&
    (needs.check-build.result == 'success') &&
    (needs.test.result == 'success' || needs.test.result == 'skipped')
reubenmiller commented 3 months ago

The linked github issue has been closed and no more job cancellations have been observed.