kubernetes / k8s.io

Code and configuration to manage Kubernetes project infrastructure, including various *.k8s.io sites
https://git.k8s.io/community/sig-k8s-infra
Apache License 2.0
717 stars 803 forks source link

Migrate kettle to k8s-infra #787

Closed spiffxp closed 4 months ago

spiffxp commented 4 years ago

Part of migrating away from gcp-project k8s-gubernator: https://github.com/kubernetes/k8s.io/issues/1308

My suggestions for target:

/wg k8s-infra /area cluster-infra /sig testing

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

spiffxp commented 4 years ago

/remove-lifecycle stale Kettle is still running there

spiffxp commented 4 years ago

Migrating kettle most likely looks something like

spiffxp commented 4 years ago

FYI @MushuEE given that you've been modifying kettle lately, if you happen see things that could help inform a plan for this, drop 'em here

MushuEE commented 4 years ago

When you say

migrate the bigquery database kettle writes to

is that to a new project? What is the: target project and target cluster?

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

spiffxp commented 3 years ago

/remove-lifecycle stale

spiffxp commented 3 years ago

/assign @MushuEE @spiffxp to investigate possible approaches

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

BenTheElder commented 3 years ago

any updates @MushuEE?

BenTheElder commented 3 years ago

/remove-lifecycle stale

ameukam commented 3 years ago

/milestone clear

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

ameukam commented 3 years ago

/remove-lifecycle stale /milestone v1.23

spiffxp commented 3 years ago

/remove-priority important-longterm /priority important-soon

ameukam commented 2 years ago

/assign /milestone v1.24

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ameukam commented 2 years ago

/remove-lifecycle stale

ameukam commented 2 years ago

/milestone clear /lifecycle frozen /priority backlog

BenTheElder commented 5 months ago

I don't have much access to the k8s-gubernator project^1 currently, so it's a bit difficult to do much to help here. I'm going to ask around about access.

The deployment details are more or less in the repo at least https://github.com/kubernetes/test-infra/blob/master/kettle/.

BenTheElder commented 5 months ago

@ixdy still works at Google and still had access, despite long since not working in cloud anymore ... myself and @liggitt now have owner access to k8s-gubernator project for continuity until we can migrate it. Thanks Jeff!

This still needs to happen before the prow default cluster shutdown in August and sooner is better.

BenTheElder commented 5 months ago

So we still have one "g8r" cluster on 1.26.11-gke.1055000 with 3 node pools, "pool-1" (e2-highmem-16, 1 node), "pool-highmem" (n1-highmem-8, 2 nodes), "pool-large" (n1-standard-8, 0 nodes).

It is running "kettle" and "kettle-staging" deployments with one pod each.

Each of those has a PD-SSD, 3001 and 201 GB respectively.

There are some bigquery datasets in this project, build/all is 1.67 TB.

BenTheElder commented 5 months ago

Given initially ingest this data from the prow GCS logs, I think we should probably look at cold-starting a new instance running in AAA, just overriding the cluster/project and deploying with the existing tooling.

There's a lot to be desired around auto deployment etc however

BenTheElder commented 5 months ago

I think @dims has this working, one remaining item will be when we're confident this is done let Googlers know and we'll see about turning down the old instance / GCP project ... (FYI @michelle192837 @cjwagner)

dims commented 5 months ago

@BenTheElder i want to watch it for a week before we can call it done!

michelle192837 commented 5 months ago

Exciting stuff! :D Thanks y'all!

BenTheElder commented 4 months ago

[I scaled the old cluster down to zero this week, we'll check back next week]

dims commented 4 months ago

thanks @BenTheElder

dims commented 4 months ago

https://storage.googleapis.com/k8s-triage/index.html is being updated.

and the flakes json looks good as well

❯ gsutil ls -l gs://k8s-metrics

Updates are available for some Google Cloud CLI components.  To install them,
please run:
  $ gcloud components update

       114  2024-05-02T00:05:31Z  gs://k8s-metrics/build-stats-latest.json
     10040  2024-05-02T00:04:51Z  gs://k8s-metrics/failures-latest.json
    103224  2024-05-02T00:04:20Z  gs://k8s-metrics/flakes-daily-latest.json
    204024  2024-05-02T00:05:48Z  gs://k8s-metrics/flakes-latest.json
         5  2024-05-02T00:04:09Z  gs://k8s-metrics/job-flakes-latest.json
    376585  2024-05-02T00:05:08Z  gs://k8s-metrics/job-health-latest.json
         3  2024-05-02T00:05:20Z  gs://k8s-metrics/pr-consistency-latest.json
     83496  2024-05-02T00:04:36Z  gs://k8s-metrics/presubmit-health-latest.json
         3  2024-05-02T00:06:01Z  gs://k8s-metrics/weekly-consistency-latest.json
                                 gs://k8s-metrics/build-stats/
                                 gs://k8s-metrics/failures/
                                 gs://k8s-metrics/flakes-daily/
                                 gs://k8s-metrics/flakes/
                                 gs://k8s-metrics/istio-job-flakes/
                                 gs://k8s-metrics/job-flakes/
                                 gs://k8s-metrics/job-health/
                                 gs://k8s-metrics/pr-consistency/
                                 gs://k8s-metrics/presubmit-health/
                                 gs://k8s-metrics/weekly-consistency/
TOTAL: 9 objects, 777494 bytes (759.27 KiB)

We can turn down the old cluster early next week @BenTheElder

BenTheElder commented 4 months ago

SGTM. At some point I'd like to turn down the bigquery datasets and anything else lingering in that project as well.

BenTheElder commented 4 months ago

/assign Will plan to turn down and delete everything in the old project this week.

ameukam commented 4 months ago

/assign Will plan to turn down and delete everything in the old project this week.

@BenTheElder also update https://github.com/kubernetes/k8s.io/issues/1308 and close it ? 🥺

BenTheElder commented 4 months ago

remaining follow up will be tracked in https://github.com/kubernetes/k8s.io/issues/1308