osrf / buildfarm-tools

2 stars 1 forks source link

Improvements to the daily checker buildfarm #57

Open Crola1702 opened 2 weeks ago

Crola1702 commented 2 weeks ago

Description

Throughout the green buildfarm subproject development, we've found some issues with current daily checker workflow. The current prioritization method of the checker workflow.

Some considerations we should have for improved daily checker workflow:

Prioritization:

Check buildfarm script

Sample report.

This is an example of how new reports should look like:

Sample report: # Buildfarmer log > Probably skip for saturday and sunday (show big log diff on Monday) >
## New X items to investigate (+/- Y): ?? No new issues! > Show new reports that didn’t existed yesterday > ### Build regressions: > Show only consistent regressions For build regressions should keep just 1 time in a row > - Issue in job : failed X times in a row - Issue in job : happened Y times in the last 2 weeks (flaky) > Ignore ClosedChanel exception ones > ### Test regressions: - Issue in job : failed X times in a row - Issue in job : happened Y times in the last 2 weeks (flaky) ### Warnings: - Job contains warnings (+/- X)
## Continue investigating: X items (+/- Y): > Show reports that still exists from yesterday > ### Build regressions: > Show only consistent regressions For build regressions should keep just 1 time in a row > - Issue in job : failed X times in a row - Issue in job : happened Y times in the last 2 weeks (flaky) ### Test regressions: - Issue in job : failed X times in a row - Issue in job : happened Y times in the last 2 weeks (flaky) ### Warnings: - Job contains warnings
## Old issues: > Show known issues > ### Jobs to check: - Job hasn’t passed in days ### Reported issues > Integration with `gh cli` > - Issue hasn’t been updated in days - Issue hasn’t happened in days. should check! ### Disabled/Skipped tests: - Total: (+/- X)
Blast545 commented 1 week ago

Thanks for tracking everything we discussed to track, I think there's nothing missing there.

In terms of the categorization, I think the wording New X items to investigate (+/- Y): ?? No new issues! can be confusing for the buildfarmer investigating issues and we can have a problem if we ignore issues just because those are reported.

Comparing against our current implementation:

### Build regressions
      * Builds failing today
### Test regressions 
      * Builds with regressions today

I think we could have something along the lines of:


## Higher priority
### Jobs with two build consecutive build regression (X items)
      * link to jobs  + sorting by number of days
### Build regressions in latest build of each job (Y items)
      * Builds 
### Jobs without any success and at least 1 entry
      * Link to job
### Jobs not green at least 3 times in a row
      * Builds

## Prioritized test regressions
### Tests failing 3 times in a row
      * Test name and flakiness per build
### Tests failing 3 times in a 2 week window
      * Test name and flakiness per build

## Test regressions all
### Test regressions without an issue
      * Test name and flakiness per build
### Keeping track
      * Issues

We should do our best to keep the "Higher priority " and the "Prioritized test regressions" as clean / organized as possible, to make it possible for people without any buildfarmer payload to browse it.

WDYT? @Crola1702 cc: @claraberendsen

Crola1702 commented 1 week ago

In terms of the categorization, I think the wording New X items to investigate (+/- Y): ?? No new issues! can be confusing for the buildfarmer investigating issues and we can have a problem if we ignore issues just because those are reported.

My idea adding the "New X items" was not letting new problems entered the buildfarm jobs. I don't think we'll ignore issues because they are reported, that's the reason behind Reported issues section (show which issues we should update re-check to keep them up-to-date. Also, now that I think about it, we can probably add an "All Reported issues" section, like a daily or weekly report of what's being reported).

If I was checking the report, I would check the number diff and see Reported issues (+10), then update/close them with new information

### Build regressions in latest build of each job (Y items)

I'm not sure what do you mean "in the latest build of each job"

### Jobs without any success and at least 1 entry

I don't think this is something we should add as higher priority.

As I've said before, IMO, jobs that haven't passed since ever are closer to "maintenance" tasks (keep buildfarm green) instead of priority tasks (report new regressions to dev teams)

Also, there are multiple jobs that haven't passed in a lot of time. Additionally, when new releases land, new jobs copy the state of its parents (e.g., Jazzy release from Rolling release or gz-sim- from gz-sim-main), and it would add more verbosity to this output. I don't think that amount of verbosity should go in the higher priority items

### Jobs not green at least 3 times in a row

This should be divided in build regressions and test regressions. I see some cases where there is an order of (BR, TR, TR or TR, BR, TR) that doesn't seem to be important to investigate. I rather prioritize unstable builds 3 times in a row (warnings or test regressions), and build failures are prioritized above (Jobs with two build consecutive build regression).

### Tests failing 3 times in a row

Covered in my comment above


In general, I think it would be valuable to add Higher Priority section, having new build regressions and adding consistent test regressions there. Jobs without any success would be on the "Old issues" (maybe rename to "Maintenance" section?), as they're not a priority on the daily basis.

I think it is worth keeping the New X items to investigate (+/- Y): ?? No new issues! because the reasons I mentioned above. And have the "Continue Investigating" section renamed to "Investigation priorities" and have the Higher Priority section there.

Crola1702 commented 1 week ago

Report format:

# Urgent investigations
    New items will have a **NEW** sufix added

    ## Build regressions (all) (+/- X)
        * (known build regressions are ignored)

    ## Not reported consistent Test regressions (3+ consective times) (+/- X)
        * [TREAT WARNINGS AS TEST REGRESSIONS]
        * (known test regressions are ignored)

    ## Not reported flaky test regressions (3+ times in a 2 week window) (+/- X)
        * (known test regressions are ignored)

# Maintenance

    ## Jobs that have fail for {x (number of buidls < [x]), all time}
        * "All time" first
        * Sorted by number of fails

    ## Reported issues
        * Issue hasn’t been updated in days. Should check!
        * Issue hasn’t happened in days. Should close!
        * Issue doen't have assignee (Next iteration 2)

    ## Disabled tests: (number)
        * Which (Iteration 2)

# Pending investigations

    ## Build regressions Known
        * All build regressions that don't fit in the constriants above

    ## Test regressions All not reported
        * All test regressions that don't fit in the constriants above

    ## Test regressions reported

Features priority (for first iteration)

  1. Urgent investigations
  2. Maintenance (jobs that have fail for…)
  3. Pedning Test regressions (check_buildfarm output)
nuclearsandwich commented 3 days ago
  • (known build regressions are ignored)

I think it's great to report the breaking news in order to make sure that new issues get the most eyeballs while they're fresh. Going along with @Blast545's concern in #62, I think rather than "ignoring" known regressions, listing them in an appendix (and double points for using markdown anchors to link to that appendix) will help keep those from falling off.