openSUSE / openSUSE-release-tools

Tools to aid in staging and release work for openSUSE/SUSE
GNU General Public License v2.0
59 stars 92 forks source link

bci_repo_publish.py: Return 1 if the last published build is over a week old #2998

Open Vogtinator opened 1 year ago

Vogtinator commented 1 year ago

This way it's easy to spot whether a manual look is required.

I'm not entirely sure whether this is a good idea. Ideally someone looks directly when builds or tests fail and fixes this before a week expires, but as additional safe guard it's probably worth it.

codecov-commenter commented 1 year ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Please upload report for BASE (master@2a0b30a). Learn more about missing BASE report. Report is 271 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #2998 +/- ## ========================================= Coverage ? 28.45% ========================================= Files ? 85 Lines ? 14716 Branches ? 0 ========================================= Hits ? 4188 Misses ? 10528 Partials ? 0 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

dirkmueller commented 1 year ago

I think we should simply stop publishing when there are failed builds? I'm not sure what hte cutoff of 7 days is solving?

Vogtinator commented 1 year ago

I'm not sure what hte cutoff of 7 days is solving?

Currently the bot fails only if some API call actually fails, but not if a build failed or openQA test failed, as this is just expected behaviour. This means that if the repo is unresolvable for a month, the only sign is that there haven't been any openQA runs recently. Or if openQA fails for a month because nobody had a look, the jobs are still green.

With this change it's visible in botmaster as a pipeline failure if there hasn't been a new build published for a week.

dirkmueller commented 1 year ago

so if I understand this correctly. the only goal of the whole logic is to make the checkbox "red" rather than "green" in the botmaster listing/graph.

how about we change the method to return True/False on whether it actually did a publish or not and then simply convert that to an exit code of 0 / 1?

then it would be red if it had something to publish but didn't publish (because tests failed or no matching run found or publish failed), and otherwise green?

does the UI support tripple state? then we could encode the "still waiting" states with another color..

Vogtinator commented 1 year ago

so if I understand this correctly. the only goal of the whole logic is to make the checkbox "red" rather than "green" in the botmaster listing/graph.

Yes.

how about we change the method to return True/False on whether it actually did a publish or not and then simply convert that to an exit code of 0 / 1? then it would be red if it had something to publish but didn't publish (because tests failed or no matching run found or publish failed), and otherwise green?

This wouldn't be able to catch if there was no new build for a week, e.g. due to unresolvable or blocked.

does the UI support tripple state? then we could encode the "still waiting" states with another color..

There's only orange for "pipeline is running"

dirkmueller commented 1 year ago

then it would be red if it had something to publish but didn't publish (because tests failed or no matching run found or publish failed), and otherwise green? This wouldn't be able to catch if there was no new build for a week, e.g. due to unresolvable or blocked.

well, unresolvable or build failed would lead to an exit 1, as well as a missing openqa run. unfinished openqa run or unclean build status would not.

does the UI support tripple state? then we could encode the "still waiting" states with another color.. There's only orange for "pipeline is running"

ok, too bad.

Well, I don't have strong feelings either way. I think the 7 days to wait is arbitrary and feels a bit too long. maybe 3 days? Other than that I still think trying to determine the status "this looks erratic" or "this can still be normal" and then just exiting 1/0 right away is the better approach. we don't want to wait for a week to notice that openqa has lost an event or the like. that is not gonna help anyone.