open-quantum-safe / tsc

OQS Technical Steering Committee resources
https://openquantumsafe.org/
Creative Commons Attribution 4.0 International
3 stars 5 forks source link

Decide procedure(s) to handle CI failures #24

Open baentsch opened 3 months ago

baentsch commented 3 months ago

The current state of CI seems unsatisfactory for a well-maintained project:

grafik

In the absence of strong maintainer presence (hopefully I don't have to apologize for my own absences now :-) this issue it to suggest creating a procedure to avoid this from re-occurring: 1) Designate which sub projects have to be "CI green" on "main" branch within a day of turning red. Arguably needs a decision on https://github.com/open-quantum-safe/tsc/issues/2 ; I'm willing to move that issue forward regardless as per the discussion there if the TSC finds this unnecessary to look into/resolve. I personally think this needs resolution as it impacts user decisions as to which OQS sub projects one can depend on (or not). 2) Designate responsibilities for fixing CI. I'd normally say that should be the maintainers (or committers in their absence) of projects designated in "maintained" state. Right now this is only documented for liboqs and oqs-provider. Both are currently "CI red" pointing to a need to improve. For oqs-provider, can I ask @thb-sb whether it's OK for you to be listed as maintainer (and then assume responsibility to look into such things in any future absences of myself)? 3) If there is general acceptance that "red CI state" is OK for some projects (?), this is to suggest documenting this decision in the README of each such sub project, maybe together with a public call searching for a maintainer (if not already declared to be EOL).

If there is someone else beyond "designated sub project maintainers" (@vsoftco @dstebila @thb-sb @baentsch) willing to take ownership for this, please chime in.

dstebila commented 3 months ago

I do get emails when CI builds fail, but at this point they've become background noise and I have a hard time distinguishing when it's a failure we need to pay attention to or not.

Any recommendations @ryjones from other projects on how they handle monitoring of ongoing CI status? I know that the theory would be that ensuring things are green on a PR before it's merged would mean that CI on main is also green, but that doesn't seem to hold true for us.

dstebila commented 3 months ago
1. Designate which sub projects have to be "CI green" on "main" branch within a day of turning red. Arguably needs a decision on [OQS sub projects: Which ones to drop for good #2](https://github.com/open-quantum-safe/tsc/issues/2) ; I'm willing to move that issue forward regardless as per the discussion there if the TSC finds this unnecessary to look into/resolve. I personally think this needs resolution as it impacts user decisions as to which OQS sub projects one can depend on (or not).

Yes, this seems sensible: our commitment to keeping CI green should be tied to which subprojects we deem supported. I'll chime in on https://github.com/open-quantum-safe/tsc/issues/2 right after this comment.

ryjones commented 3 months ago

@dstebila you could set quality gates for an auto merge. Basically, if (some list of jobs fail don't) and (has the votes needed to merge) and (all conversations are resolved)

dstebila commented 3 months ago

@dstebila you could set quality gates for an auto merge. Basically, if (some list of jobs fail don't) and (has the votes needed to merge) and (all conversations are resolved)

I was asking more about getting auto-alerted if a weekly job fails. Basically, does Github have anything that would cover the same type of functionality we have on https://openquantumsafe.org/dashboard.html?

ryjones commented 3 months ago

Yes, you could do that. Basically, a weekly job that checks those statuses and files an issue or a new discussion item and tags people. you could use something like this with the first step being checking the build status list as appropriate.

baentsch commented 1 month ago

You seem to be knowledgeable in this @ryjones, right? Would you be willing to take on doing such automation? As to which sub projects this should apply to: Only those with GOVERNANCE.md files, agreed @dstebila ?