Closed dylanmtaylor closed 2 months ago
I've added requested changes to the associated PR ( #503 ), but I'll add some thoughts here for good measure.
@dylanmtaylor is very much correct that we sometimes have spurious failures (various causes which likely include network issues), and those result in the team needing to manually retry some runs of the workflow. As an example, a spurious failure will usually result in success for most of the matrix options, but one or two will fail.
I do agree that automatically retrying certain steps of the workflow will be helpful.
I identified the two most useful in my requested changes:
I've specifically requested that we do NOT auto-retry the most complex step, Build Image. The most common causes of failure here are legitimate, usually due to an upstream RPM dependency issue. The one spurious issue I do know of in Build Image is related to the github-release-install.sh
shell script which helps us install RPM packages direct from a project's github release. This is where I'd like to see an improvement to the shell script to handle those failures and retry internally. I've already made one such attempt with only partial success.
In addition to all this, I'd really like to see these improvements in ublue-os/main
... but I hesitate to implement in the 6 other "foundational"/"hardware enablement" repos we maintain. We've already had some discussions on merging and cleaning them up as it's currently very messy to maintain them all as distinct repos.
Hope that provides some context to any reader regarding my views on this topic.
Actually, i think we should close this as "done" since we merged the PR at the top and have continued to add appropriate retry logic in various places throughout the project.
I see that build actions sometimes fail. I think we should leverage the retry action on ublue builds with an attempt limit of 3. https://github.com/marketplace/actions/retry-action
That way if it's a weird network issue it something we won't have a day without a new image.