openshift / sippy

Sippy provides dashboards for the openshift CI test/job data
https://sippy-bparees.svc.ci.openshift.org/?release=4.5
Apache License 2.0
28 stars 60 forks source link

[RFE] Sippy entry filters #23

Closed Bowenislandsong closed 3 years ago

Bowenislandsong commented 4 years ago

As a build watcher, I would like to see less of the same bugs that come up from different sippy entries. Sometimes tests may fail from quota insufficiency. These sippy entires get triaged every day by different build-watchers. We need a way to either mark as won't fix on sippy or create a bug for said entry. On the other hand, a non-hacky solution can be creating filter for sippy.

bparees commented 4 years ago

@Bowenislandsong i'm not following your scenario, can you give me an example?

Note that sippy is based around the following model:

1) did a test fail? 2) what is the name of the test that failed? 3) does https://search.apps.build01.ci.devcluster.openshift.com/ return a BZ match for the test name? 4) if yes: associate the bug with the failing test 5) if no: report the test failure as not having a known bug

Sippy has no data about why a particular test failed(not easily anyway, and not in a way that could be done efficiently) so stuff like "all these tests keep failing because of AWS issues" are going to be a weak spot in terms of it being able to help.

Bowenislandsong commented 4 years ago

You are right and Sippy should not need to store data, but maybe we can still do something to reduce the duplicated efforts by allowing either (shareable) filtering or auto/manually associate the bug reports.

Example - Different platform same Bug:

Bug 1852992 - [sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Dynamic PV (filesystem volmode)] volumeMode should not mount / map unused volumes in a pod On https://sippy-bparees.svc.ci.openshift.org/?release=4.6, today, [sig-storage] In-tree Volumes [Driver: gcepd] [Testpattern: Pre-provisioned PV (block volmode)] volumeMode should not mount / map unused volumes in a pod

Example - Quota Insufficiency + Empty Search (this part might be a regex related bug):

https://search.apps.build01.ci.devcluster.openshift.com/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Deploy+Image%5C.Deploy+Image+page%5C.should+deploy+the+image+and+display+it+in+the+topology is an entry on Sippy, but the search is empty. I altered the search to https://search.apps.build01.ci.devcluster.openshift.com/?search=should+deploy+the+image+and+display+it+in+the+topology&maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job and the results came up. However, these are obvious quota insufficiency which I want to avoid filing a bug for.

So for these two scenarios where human attention is needed to make the one-time judgment, I would like to help so that the next watcher does not need to go through the same problem. One argument might be that we should be filing bugs for these and immediately close them for dup or won't fix, but we could also go for other solutions that do not involve opening bugs just to close them. Regardless of the routes we take, I think we should include the outcome, should it be open them to close it and people won't see them again in the Sippy, in the Build watcher's responsibility description.

bparees commented 4 years ago

auto/manually associate the bug reports Example - Different platform same Bug:

you can link any existing BZ to a particular test by adding the test name to the BZ either in a comment, or in the environment field. Sippy will pick up on the association during its next refresh(hourly). So if you think that both tests are really the same bug, just list the full test names for both tests in the bug comments or environment section.

Example - Quota Insufficiency + Empty Search (this part might be a regex related bug):

yes the dots should not be escaped in that search string, i'm not sure why they are. Sippy generated that search query? i'd be curious where you're seeing that.

However, these are obvious quota insufficiency which I want to avoid filing a bug for.

I would not expect to be seeing a test reach the top of the "common test failures" list solely because it was impacted by quota issues (the "setup" tests which just install the cluster may be an exception, but sippy normally ignores those entirely). It may be that one or two of the failures of a given test are due to that, but if it's failing often enough to make the top failures list, there are other failures that are not quota related, so you need to look through more of the test failures (look at different runs) to find the real/more common failure cause.

Bowenislandsong commented 4 years ago

yes the dots should not be escaped in that search string, i'm not sure why they are. Sippy generated that search query? i'd be curious where you're seeing that.

If you take look at any entry that has a ., it will not return any search results. see operator.Build image ansible from the repository from today's https://sippy-bparees.svc.ci.openshift.org/?release=4.4. I dont know other ways to reference this. There isn't a way to share this.

Bowenislandsong commented 4 years ago

About the other points, I guess I was not familiar with Sippy enough to know it. I will update the "watcher" doc accordingly. We can keep discussing the sippy generated entries returning null results here, else you can close this. Thank you.

bparees commented 4 years ago

If you take look at any entry that has a ., it will not return any search results. see operator.Build image ansible from the repository from today's https://sippy-bparees.svc.ci.openshift.org/?release=4.4. I dont know other ways to reference this. There isn't a way to share this.

The issue here is that Sippy is being told the test name that failed is operator.Build image ansible from the repository

Which comes from testgrid: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.4-informing#release-openshift-ocp-e2e-aws-scaleup-rhel7-4.4 image

But the junit from the job itself does not contain that test name: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-ocp-e2e-aws-scaleup-rhel7-4.6/1279982296945922048

The junit name appears to be Build image ansible from the repository expand_more (that is, the operator. prefix is not present)

@smarterclayton @openshift/openshift-team-developer-productivity-test-platform can you explain why testgrid has a different test name than what is seen in prow?

bparees commented 4 years ago

(as an aside, sippy will now ignore those tests: https://github.com/openshift/sippy/pull/25)

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 3 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 3 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot commented 3 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/sippy/issues/23#issuecomment-751127457): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.