kubernetes / k8s.io

Code and configuration to manage Kubernetes project infrastructure, including various *.k8s.io sites
https://git.k8s.io/community/sig-k8s-infra
Apache License 2.0
693 stars 788 forks source link

etcd-io Infra and CI Migration #6102

Open upodroid opened 8 months ago

upodroid commented 8 months ago

etcd is now a subproject of Kubernetes and etcd maintainers are looking to adopt the CI system and Infra management approach we use for Kubernetes.

### GitHub
- [ ] https://github.com/kubernetes/org/pull/4498
### Infra
- [ ] https://github.com/kubernetes/k8s.io/issues/6600
- [ ] Enumerate all the GCP projects used by etcd and discuss what to do with them. Some GCP resources are mentioned in the etcd releases notes. https://github.com/etcd-io/etcd/releases/tag/v3.5.10
- [ ] Enumerate all the *.etcd.io services and migrate them to the community cluster. https://github.com/kubernetes/k8s.io/blob/main/running-in-community-clusters.md
- [ ] Deploy arm64 nodepools on `k8s-infra-prow-build` GKE cluster. etcd has a requirement to run e2e tests on arm64 hardware, https://github.com/etcd-io/etcd/pull/16801
- [ ] Explore serving etcd images at `registry.k8s.io` and resolve the mismatch between https://explore.ggcr.dev/?repo=registry.k8s.io%2Fetcd and https://explore.ggcr.dev/?repo=gcr.io%2Fetcd-development%2Fetcd
- [ ] Verify if `etcd.io` domain is being used for GSuite. If not, attach it to the kubernetes.io gsuite and create the relevant mailing lists for it.
### Testing
- [ ] https://github.com/kubernetes/test-infra/pull/31218
- [ ] https://github.com/kubernetes/test-infra/pull/31257
- [ ] Enable additional prow plugins, particularly tide, lgtm and approve plugins.
- [ ] https://github.com/kubernetes/test-infra/issues/31273
- [ ] https://github.com/kubernetes/test-infra/pull/31421
- [ ] Fix the main etcd e2e job https://testgrid.k8s.io/sig-etcd-periodics#ci-etcd-e2e-amd64
- [ ] Create a prowjob for all the github actions at https://github.com/etcd-io/etcd/tree/main/.github/workflows

If I missed something, feel free to comment on the issue and I'll update the tracker.

/cc @jmhbnz @serathius @wenjiaswe @mrbobbytables @ahrtr @ameukam @BenTheElder

/sig etcd /sig testing /priority important-soon /kind feature

serathius commented 8 months ago

Have we stopped running presubmits? I stopped seeing them in Github PRs and https://testgrid.k8s.io/sig-etcd-presubmits seems empty. Nothing in https://prow.k8s.io/?repo=etcd-io%2Fetcd too

upodroid commented 8 months ago

There is a thread in #testing-ops on Slack to investigate this issue https://kubernetes.slack.com/archives/C7J9RP96G/p1700688510160169

serathius commented 8 months ago

Interesting flake in unit test:

=== FAIL: storage/schema TestMigrate/Upgrading_3.6_to_v3.7_is_not_supported (0.01s)
    logger.go:130: 2023-11-23T15:59:05.436Z WARN    failed to preallocate an initial WAL file   {"path": "/tmp/TestMigrateUpgrading_3.6_to_v3.7_is_not_supported2935238209/002/etcd_wal_test401442[383](https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/etcd-io_etcd/17008/pull-etcd-unit-test/1727717754938593280#1:build-log.txt%3A383)7/wal.tmp/0000000000000000-0000000000000000.wal", "segment-bytes": 64000000, "error": "no space left on device"}
    schema_test.go:207: Failed to create WAL: no space left on device
    --- FAIL: TestMigrate/Upgrading_3.6_to_v3.7_is_not_supported (0.01s)

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/etcd-io_etcd/17008/pull-etcd-unit-test/1727717754938593280

upodroid commented 8 months ago

Does that unit test expect the pod to have an ephemeral volume of a specific size?

serathius commented 8 months ago

I think problem might stem from etcd WAL tests. I don't think that unit tests mock storage, just write to t.TempDir() (should be /tmp/ by default). WAL creation pre-allocates 64MB, so if there are couple of such tests running without cleanup we could be allocating couple of hundreds of megabytes.

wenjiaswe commented 7 months ago

cc @siyuanfoundation

BenTheElder commented 7 months ago

we could mount an emptyDir (disk or memory) to /tmp if etcd tests are writing to it heavily.

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

upodroid commented 4 months ago

/lifecycle frozen

serathius commented 3 months ago

Note for robustness tests, there is one functionally to github actions that was not migrated. Uploading of test report artifacts.

cblecker commented 3 months ago

etcd DNS was migrated in https://github.com/kubernetes/k8s.io/issues/6600.

Today I have migrated the etcd netlify site to the Kubernetes account, and requested the CNCF close the etcd netlify account in CNCFSD-2245

ArkaSaha30 commented 3 months ago

Hello 👋 I am willing to take up some of the workflow issues for migration to Prowjobs