Closed spiffxp closed 4 years ago
/remove-help /assign
Tangentially related, it would be nice to know if we even need to use --stage=gs://kubernetes-release-pull
(ref https://github.com/kubernetes/test-infra/issues/18789). I already migrated over pull-kubernetes-e2e-gce-ubuntu-containerd which uses it, so I'll do the same here. But would then like to remove it if it's not needed, or migrate to kubernetes.io-owned gs://k8s-release-pull
if it's needed
Opened https://github.com/kubernetes/test-infra/pull/18916
The main branch and 1.19 variants aren't merge-blocking anymore, but earlier branches are. Moving them all over
https://github.com/kubernetes/test-infra/pull/18916 merged 2020-08-19 16:40 PT
https://prow.k8s.io/?job=pull-kubernetes-e2e-gce - shows a reasonable amount of traffic since there is now a push to get PR's landed in time for the final cut of kubernetes v1.16. The only failures appear to be flakes
https://testgrid.k8s.io/presubmits-kubernetes-blocking#pull-kubernetes-e2e-gce&graph-metrics=test-duration-minutes - overall the job duration is less spiky and has maybe gone slightly down over time
https://storage.googleapis.com/k8s-gubernator/triage/index.html?pr=1&job=pull-kubernetes-e2e-gce%24 - no real change in errors
https://prow.k8s.io/job-history/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-e2e-gce - Seeing https://github.com/kubernetes/test-infra/issues/19034, would like to understand whether this job caused that issue or something else
cpu utilization - big spikes in the beginning for build, then nothing
memory utilization - looks like that's about right
So if it turn out https://github.com/kubernetes/test-infra/issues/19034 is unrelated to this change, we're good. But need to dig into that a little more first
@spiffxp I this moved to In Progress. Will have a look at #19034 ...
@RobertKielty @spiffxp I would like to work on this issue
@spiffxp can you help me understand what does the following mean and how this affects the changes to be made for this issue
it's being demoted from merge-blocking on release-1.19 and the main branch (as of #18832)
@snowmanstark
So, the changes have already been made via https://github.com/kubernetes/test-infra/pull/18916 (see https://github.com/kubernetes/test-infra/issues/18852#issuecomment-676792471)
The reason this is still open is because https://github.com/kubernetes/test-infra/issues/19034 is unexplained, and maybe happened around the same time https://github.com/kubernetes/test-infra/pull/18916 merged? If we can either prove that https://github.com/kubernetes/test-infra/pull/18916 didn't cause it (see https://github.com/kubernetes/test-infra/issues/19034#issuecomment-684130355), or if we can fix https://github.com/kubernetes/test-infra/issues/19034, then this issue can be closed.
To answer your question
https://github.com/kubernetes/test-infra/pull/18832 set always_run
to false for the main branch when v1.19 was under development, and the release-1.19 branch. There is no run_if_changed
for it, thus it's not considered merge-blocking for those branches.
It is still merge blocking for older branches (release-1.18, release-1.17), as we generally don't backport policy or test changes back to already-released versions of kubernetes except under special circumstances.
The reason this complicates things is the job wouldn't see as much traffic as jobs that always run for all branches, so it's tougher to avoid variance due to a smaller sample-set size, and thus tougher to make a judgement call on "does everything still look OK."
However, I saw enough traffic in https://github.com/kubernetes/test-infra/issues/18852#issuecomment-684128619 when cherry picks were being swept through in advance of upcoming patch releases. So aside from the question of https://github.com/kubernetes/test-infra/issues/19034 I think this looks good
Thanks @spiffxp for that explanation. It makes total sense to me now. I'll look into #19034 too to get this closed.
@spiffxp I looked into #19034 and nothings seems to be off there.
Hi @spiffxp can this issue be closed now?
I've updated #19034
We need to review #19034 for sure but I'm confused as to how both these issues are related?
Per https://github.com/kubernetes/test-infra/issues/18852#issuecomment-692196542 the reason I held this open is because I'm still not certain that migration of this job did not cause https://github.com/kubernetes/test-infra/issues/19034. But we've lived with it unresolved for about 90d now, so I guess we can live with it unexplained for longer.
/close
@spiffxp: Closing this issue.
What should be cleaned up or changed:
This is part of #18550
To properly monitor the outcome of this, you should be a member of k8s-infra-prow-viewers@kubernetes.io. PR yourself into https://github.com/kubernetes/k8s.io/blob/master/groups/groups.yaml#L603-L628 if you're not a member.
Migrate pull-kubernetes-e2e-gce to k8s-infra-prow-build by adding a
cluster: k8s-infra-prow-build
field to the job:NOTE: migrating this job is not as straightforward as some of the other #18550 issues, because:
Once the PR has merged, note the date/time it merged. This will allow you to compare before/after behavior.
Things to watch for the job
pull-kubernetes-e2e-gce
for 6hpull-kubernetes-e2e-gce
for 6hThings to watch for the build cluster
Keep this open for at least 24h of weekday PR traffic. If everything continues to look good, then this can be closed.
/wg k8s-infra /sig testing /area jobs /help