Remember failed builds - Githubissues

lazka commented 4 years ago

We could create a "staging-failed" release which contains "pkgname-version.failed" assets which we can use to skip previously failed builds.

eine commented 4 years ago

On the one hand, I don't know if the failed state should be permanent. For example, a package that depends on another one being available might fail once, but succeed in the next run. Hence, having to manually move that out of the "failed" list might be cumbersome.

On the other hand, currently there are packages that take too long, and as a result most runs take +4h. That's not desirable, because smaller packages that are already built need to wait long before being pushed to the pre-release.

Hence, I think we could put some "intelligence" into the system. After the latest changes, Github is a dependency of buildqueue.py. This means that, potentially, the functionality of eine/tip can be partially or completely replicated here (see https://github.com/eine/tip/blob/master/tip.py). For example, https://github.com/eine/tip/blob/master/tip.py#L100-L130 might be moved into a function/file upload_artifacts, which can be reused at the end of building each package. That would allow to update staging-m* or staging-failed regardless of later jobs/tasks taking longer.

At the same time, when any job/task fails, we can check how long did it take to fail. Then, future jobs can first process new tasks, and after they are done with those, they can start trying failed ones starting from the shorter to the longer. We can optionally set a threshold to avoid the longest.

lazka commented 4 years ago

My hope is that once everything has settled down that failed builds aren't that common. And we can just go into the release and remove the .failed files manually if needed.

Even if it turns out to be not desirable in the long run it will make testing easier for now, since CI time isn't wasted on potentially failing builds.

lazka commented 4 years ago

btw, I see you post in every issue I end up finding when re github actions :D Do you know if it is now possible to trigger the same workflow on the same repo using the GITHUB_TOKEN from the workflow?

lazka commented 4 years ago

I've created #6 which, if implemented, would make failing packages because of dependency issues less likely since we stop on the first error.

eine commented 4 years ago

btw, I see you post in every issue I end up finding when re github actions :D

A couple of years ago Travis started to get worse, and I was really excited when GitHub Actions was announced. Being the most important forge nowadays, and being bought by Microsoft, I expected them to offer a high quality service. Then, I (as others) was quite frustrated to find out that most decisions were marketing-driven and not properly developed from an engineering point of view. I bet most of the posts you find were written on my way to become aware of that.

The same applies to supporting MSYS2 in windows-latest. I did need to ask for multiple direct or indirect enhancements until we could achieve the relative simple and nice UX that the current Action is providing (in part thanks to you, of course). If you have some question about GitHub Actions, there are many chances of me knowing the answer or having asked about it somewhere. Do not hesitate to ask.

Do you know if it is now possible to trigger the same workflow on the same repo using the GITHUB_TOKEN from the workflow?

AFAIK, it is not. The solution is technically very simple, but not desirable. Create a PAT and use it in place of the GITHUB_TOKEN. You don't need to change anything, exactly the same code/script works. The limitation for events/actions/instructions authorized by GITHUB_TOKEN to not be able to trigger event is an ARTIFICIAL limitation to avoid infinite event loops due to poor workflow design. That's an indication of the users that GitHub/Microsoft are targeting. Another indication is the fact that Actions need to be written in JavaScript. I believe they should allow GITHUB_TOKENs to trigger events, and optionally show a large red warning for users to be aware. Currently, there are many legit use cases that are limited because they are trying to babysit developers.

Moreover, chances are that you don't want to create a PAT for your user and let an automated system do anything on your behalf. Hence, the recommended approach is to create a machine/bot account. That is, you need to handle a separate user and e-mail. The PAT you create for that user will still need to have write access to the repos (see https://github.community/t/triggering-a-new-workflow-from-another-workflow/16250/22), but at least you can make it limited to some org. I think it is undesirable that write access is required, because it allows the PAT to potentially overwrite important branches. But, AFAIK, that's the only possibility for now.

Anyway, it is strange to me that workflow dispatch events are not handled as regular webhooks (as commented in https://github.community/t/triggering-a-new-workflow-from-another-workflow/16250/20).

lazka commented 4 years ago

Thanks :)

The failed skipping is now implemented.

Some future related improvements:

Clean up old failed markers: #10
Instead of writing dummy files include the build logs for files uploaded to staging-failed: #11

lazka commented 4 years ago

The reason I was asking was that we could keep builds short and just retrigger a new build until all is done. But cron should be fine for starters...

msys2 / msys2-autobuild

Remember failed builds #5