xamarin / xamarin-macios

.NET for iOS, Mac Catalyst, macOS, and tvOS provide open-source bindings of the Apple SDKs for use with .NET managed languages such as C#
Other
2.45k stars 511 forks source link

[RFC] How to automate identifying known test failures #3910

Open rolfbjarne opened 6 years ago

rolfbjarne commented 6 years ago

We have frequent test failures in our CI, both random and other types. This takes a significant amount of manpower to diagnose, so automating it somehow would be quite beneficial.

On the other hand, it's often not easy to determine if a failure is already a known issue or not, since failures come in all shapes and sizes. On the more extreme end it could end up becoming some sort of AI project...

Ground rules

AI rules

This is a list of rules about how to match test failures with known issues.

Build failures

NUnit test failures

Examples:

Test execution problems

Examples:

Test crashes

Examples:

Other failures

Data format

Xml file, with fairly simple syntax to execute.

<known-issues version="1.0">
    <known-issue description="human readable description" action="rerun|ignore|fail">
        <condition>
            <and> <!-- matches if all nested conditions -->
                <match testresult="BuildFailure" />
                <or> <!-- matches if any of the nested conditions match -->
                    <match logname="Build log" containsText="CSC0123: The Doctor failed to save Earth." />
                    <match logname="Build log" containsRegularExpression="CSC0124: The TARDIS was lost in time in the year [0-9]*." />
                </or>
                <not>
                    <match testresult="Success" />
                </not>
            </and>
        </condition>
    </known-issue>
</known-issues>

To ease writing data files, it should be possible to execute/validate them against an existing html report.

spouliot commented 6 years ago

let's start with goals and a bit of context

Goals

Today we have a system that relies on colours. That's fine because it's instinctive.

However for many (good, bad and ugly) reasons the majority of wrench builds ends up being orange (it's better for PR on Jenkins). The biggest reason for the orangeness are random, known issues. Even low frequency random issues happens frequently when we have more than 100k gates that can turn a build to orange.

Core team members can identify safe builds (the majority) because we, as a policy, investigate build failures and file issues on them (for tracking purposes). This is time consuming (for repeated offenders) and does not help everyone to quickly identify a good build.

spouliot commented 6 years ago

The goals are ambitious (as much as we want accuracy anyway) but I think it can start small and expand as needed, i.e. if things gets too complex then we need to question (and invest in) the tests.

Every additional green build free us some time to fix something else (instead of reviewing logs). So if we can cheaply solve 2 of out 3 cases then our tree will be largely green and that would solve 90% of the problem (and 99% of the complaints).

A good example is tonight's https://github.com/xamarin/xamarin-macios/pull/3918

apitest/Mac Unified XM45 32-bit: TimedOut (Execution timed out after 1200 seconds.)

That, if a known issue [1], could easily be ignored [2] or at least give a link to the suspected known issue (but that goes back to #3909)

[1] right now it's an hard problem because we don't have a common/unique way to identify them. Luckily it's more a human problem than a technical one [2] maybe it should not be, it's a bit general message

There's some proposals that are really great, e.g.

xharness should support rerunning a test as the result of finding a known issue.

but the amount of work to get there seems a lot higher than ignoring known issues.

Also should it be xharness ? or something else that filters the results. The later would mean the logic could exists outside the repo (and branches) which has both pros and cons. We already have the data out of the repo...

Finally how can the TARDIS be lost in time if you know the year it's lost ? Yet another example why regex don't make sense ;-)