kubernetes / test-infra

Test infrastructure for the Kubernetes project.
Apache License 2.0
3.83k stars 2.64k forks source link

Move all jobs to using logexporter and make it default #4046

Open shyamjvs opened 7 years ago

shyamjvs commented 7 years ago

Ref https://github.com/kubernetes/kubernetes/issues/48513

Logexporter has been stabilized after multiple fixes. It's been enabled in most of our scalability jobs and is working just fine. E.g: https://k8s-testgrid.appspot.com/google-gce-scale#gce-scale-performance (our 5k-node test, where the time taken for logdump has reduced from >4hr to <20min) https://k8s-testgrid.appspot.com/google-gce-scale#gce (our 100-node test, where the time taken for logdump has reduced from 10min -> 2min) https://k8s-testgrid.appspot.com/google-gce-scale#gci-gce

It does the dumping cleanly (in parallel across all nodes in the cluster). And besides saving time, it also saves a lot of diskspace/inodes in the job containers. We should move all our jobs to using it (except ones from release branches which don't have the logexporter changes yet).

cc @fejta @kubernetes/test-infra-maintainers @kubernetes/sig-scalability-misc

krzyzacy commented 7 years ago

how about enable them for canary jobs first, if they work we can flip the flag to be default enabled.

shyamjvs commented 7 years ago

That's a good idea. But we don't yet want to enable it by default as it wouldn't work on release branch jobs.

shyamjvs commented 7 years ago

@krzyzacy Can you bump prow image with the new kubekins image I changed in https://github.com/kubernetes/test-infra/pull/4187? This is required to enable logexporter for all gke jobs. Thanks.

krzyzacy commented 7 years ago

@shyamjvs you can run ./experiment/bump_e2e_image.sh and push the commits

shyamjvs commented 7 years ago

Done - thanks https://github.com/kubernetes/test-infra/pull/4197

fejta-bot commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta. /lifecycle stale

shyamjvs commented 6 years ago

/remove-lifecycle stale /lifecycle frozen

@krzyzacy Does something stop us from making logexporter the default for all jobs now? I can think of none. Previously we didn't have logexporter support in older k8s releases, but now those jobs are against newer releases.

krzyzacy commented 6 years ago

@shyamjvs not really for logexporter, but I'd like to see the logic in log-dump.sh can go into kubetest

fejta commented 6 years ago

/remove-lifecycle frozen

fejta-bot commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 6 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten /remove-lifecycle stale

BenTheElder commented 6 years ago

/remove-lifecycle stale

On Fri, May 25, 2018, 1:52 AM fejta-bot notifications@github.com wrote:

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta https://github.com/fejta. /lifecycle rotten /remove-lifecycle stale

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kubernetes/test-infra/issues/4046#issuecomment-391987067, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4Bq1jl-L30z1WMDMB7HaMU7pifLubKks5t18Y3gaJpZM4Oz4JT .

fejta-bot commented 6 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

BenTheElder commented 6 years ago

/remove-lifecycle stale /reopen

On Sun, Jun 24, 2018, 09:46 k8s-ci-robot notifications@github.com wrote:

Closed #4046 https://github.com/kubernetes/test-infra/issues/4046.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kubernetes/test-infra/issues/4046#event-1697665785, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4Bq_tmklxtEORKSvG6g1BvIrKYDM7Aks5t_8JmgaJpZM4Oz4JT .

k8s-ci-robot commented 6 years ago

@BenTheElder: you can't re-open an issue/PR unless you authored it or you are assigned to it.

In response to [this](https://github.com/kubernetes/test-infra/issues/4046#issuecomment-399781899): >/remove-lifecycle stale >/reopen > >On Sun, Jun 24, 2018, 09:46 k8s-ci-robot wrote: > >> Closed #4046 . >> >> — >> You are receiving this because you are subscribed to this thread. >> Reply to this email directly, view it on GitHub >> , >> or mute the thread >> >> . >> > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
BenTheElder commented 6 years ago

https://github.com/kubernetes/test-infra/issues/4425#issuecomment-399781968

fejta-bot commented 6 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

wojtek-t commented 5 years ago

/remove-lifecycle stale /lifecycle frozen

spiffxp commented 4 years ago

/remove-lifecycle frozen I'm taking no action on this in ~2 years as a sign that it's no longer that important

BenTheElder commented 4 years ago

/sig scalability

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

wojtek-t commented 3 years ago

/remove-lifecycle stale

I'm taking no action on this in ~2 years as a sign that it's no longer that important

I think it is important. It just means that we don't have enough capacity to push all important things...

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

wojtek-t commented 3 years ago

/remove-lifecycle stale /lifecycle frozen

spiffxp commented 3 years ago

/kind cleanup