Closed chrisdoherty4 closed 1 year ago
Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: chrisdoherty4
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Name | Link |
---|---|
Latest commit | 9f73daea8c148f6067e8144fe330bff83c835f43 |
Latest deploy log | https://app.netlify.com/sites/kubernetes-sigs-cluster-api-cloudstack/deploys/64b1b1fa22dde80007b9dc8b |
Deploy Preview | https://deploy-preview-290--kubernetes-sigs-cluster-api-cloudstack.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
Patch coverage has no change and project coverage change: -0.05
:warning:
Comparison is base (
4ccf853
) 25.29% compared to head (9f73dae
) 25.25%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
/run-e2e -c 4.18
/lgtm
/hold
/hold
The E2E don't seem to be getting kicked off?
/assign @vishesh92 @weizhouapache
@chrisdoherty4: GitHub didn't allow me to assign the following users: vishesh92.
Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
@chrisdoherty4 it's possible the backend BO script is facing Github rate limits (we found recently the script/jenkins jobs are running but couldn't post results as Github rate limit would void the API request to post comments for some reason)
/run-e2e help
@rohityadavcloud The command to run e2e test for CAPC.
Usage: /run-e2e [-k Kubernetes_Version] [-c CloudStack_Version] [-h Hypervisor] [-i Template/Image] [-f Kubernetes_Version_Upgrade_From] [-t Kubernetes_Version_Upgrade_To]
Examples:
/run-e2e -c 4.18
@rohityadavcloud a jenkins job has been kicked to run test with following paramaters:
Test Results : (tid-272) Environment: kvm Rocky8(x3), Advanced Networking with Management Server Rocky8 Kubernetes Version: v1.27.2 Kubernetes Version upgrade from: v1.26.5 Kubernetes Version upgrade to: v1.27.2 CloudStack Version: 4.18 Template: ubuntu-2004-kube E2E Test Run Logs: https://github.com/blueorangutan/capc-prs/releases/download/capc-pr-ci-cd/capc-e2e-artifacts-pr290-sl-272.zip
[PASS] When testing node drain timeout A node should be forcefully removed if it cannot be drained in time
[PASS] with two clusters should successfully add and remove a second cluster without breaking the first cluster
[PASS] When testing subdomain Should create a cluster in a subdomain
[PASS] When testing app deployment to the workload cluster with network interruption [ToxiProxy] Should be able to create a cluster despite a network interruption during that process
[PASS] When testing affinity group Should have host affinity group when affinity is anti
[PASS] When testing machine remediation Should replace a machine when it is destroyed
[PASS] When testing Kubernetes version upgrades Should successfully upgrade kubernetes versions when there is a change in relevant fields
[PASS] When testing K8S conformance [Conformance] Should create a workload cluster and run kubetest
[PASS] When testing MachineDeployment rolling upgrades Should successfully upgrade Machines upon changes in relevant MachineDeployment fields
[PASS] When testing with custom disk offering Should successfully create a cluster with a custom disk offering
[PASS] When testing multiple CPs in a shared network with kubevip Should successfully create a cluster with multiple CPs in a shared network
[PASS] When testing with disk offering Should successfully create a cluster with disk offering
[PASS] When the specified resource does not exist Should fail due to the specified account is not found [TC4a]
[PASS] When the specified resource does not exist Should fail due to the specified domain is not found [TC4b]
[PASS] When the specified resource does not exist Should fail due to the specified control plane offering is not found [TC7]
[PASS] When the specified resource does not exist Should fail due to the specified template is not found [TC6]
[PASS] When the specified resource does not exist Should fail due to the specified zone is not found [TC3]
[PASS] When the specified resource does not exist Should fail due to the specified disk offering is not found
[PASS] When the specified resource does not exist Should fail due to the compute resources are not sufficient for the specified offering [TC8]
[PASS] When the specified resource does not exist Should fail due to the specified disk offer is not customized but the disk size is specified
[PASS] When the specified resource does not exist Should fail due to the specified disk offer is customized but the disk size is not specified
[PASS] When the specified resource does not exist Should fail due to the public IP can not be found
[PASS] When the specified resource does not exist When starting with a healthy cluster Should fail to upgrade worker machine due to insufficient compute resources
[PASS] When the specified resource does not exist When starting with a healthy cluster Should fail to upgrade control plane machine due to insufficient compute resources
[PASS] When testing horizontal scale out/in [TC17][TC18][TC20][TC21] Should successfully scale machine replicas up and down horizontally
[PASS] When testing app deployment to the workload cluster with slow network [ToxiProxy] Should be able to download an HTML from the app deployed to the workload cluster
Summarizing 3 Failures:
[Fail] When testing affinity group [It] Should have host affinity group when affinity is pro
/jenkins/workspace/capc-e2e-new/test/e2e/common.go:331
[Fail] When testing resource cleanup [AfterEach] Should create a new network when the specified network does not exist
/jenkins/workspace/capc-e2e-new/test/e2e/resource_cleanup.go:101
[Fail] When testing app deployment to the workload cluster [TC1][PR-Blocking] [It] Should be able to download an HTML from the app deployed to the workload cluster
/jenkins/workspace/capc-e2e-new/test/e2e/deploy_app.go:111
Ran 28 of 29 Specs in 9307.217 seconds
FAIL! -- 25 Passed | 3 Failed | 0 Pending | 1 Skipped
--- FAIL: TestE2E (9307.23s)
FAIL
/run-e2e -c 4.18
@chrisdoherty4 a jenkins job has been kicked to run test with following paramaters:
/uncc @davidjumani
Test Results : (tid-273) Environment: kvm Rocky8(x3), Advanced Networking with Management Server Rocky8 Kubernetes Version: v1.27.2 Kubernetes Version upgrade from: v1.26.5 Kubernetes Version upgrade to: v1.27.2 CloudStack Version: 4.18 Template: ubuntu-2004-kube E2E Test Run Logs: https://github.com/blueorangutan/capc-prs/releases/download/capc-pr-ci-cd/capc-e2e-artifacts-pr290-sl-273.zip
[PASS] When testing with disk offering Should successfully create a cluster with disk offering
[PASS] When testing app deployment to the workload cluster [TC1][PR-Blocking] Should be able to download an HTML from the app deployed to the workload cluster
[PASS] When testing with custom disk offering Should successfully create a cluster with a custom disk offering
[PASS] When testing horizontal scale out/in [TC17][TC18][TC20][TC21] Should successfully scale machine replicas up and down horizontally
[PASS] with two clusters should successfully add and remove a second cluster without breaking the first cluster
[PASS] When testing app deployment to the workload cluster with network interruption [ToxiProxy] Should be able to create a cluster despite a network interruption during that process
[PASS] When testing K8S conformance [Conformance] Should create a workload cluster and run kubetest
[PASS] When testing multiple CPs in a shared network with kubevip Should successfully create a cluster with multiple CPs in a shared network
[PASS] When testing machine remediation Should replace a machine when it is destroyed
[PASS] When testing subdomain Should create a cluster in a subdomain
[PASS] When the specified resource does not exist Should fail due to the specified account is not found [TC4a]
[PASS] When the specified resource does not exist Should fail due to the specified domain is not found [TC4b]
[PASS] When the specified resource does not exist Should fail due to the specified control plane offering is not found [TC7]
[PASS] When the specified resource does not exist Should fail due to the specified template is not found [TC6]
[PASS] When the specified resource does not exist Should fail due to the specified zone is not found [TC3]
[PASS] When the specified resource does not exist Should fail due to the specified disk offering is not found
[PASS] When the specified resource does not exist Should fail due to the compute resources are not sufficient for the specified offering [TC8]
[PASS] When the specified resource does not exist Should fail due to the specified disk offer is not customized but the disk size is specified
[PASS] When the specified resource does not exist Should fail due to the specified disk offer is customized but the disk size is not specified
[PASS] When the specified resource does not exist Should fail due to the public IP can not be found
[PASS] When the specified resource does not exist When starting with a healthy cluster Should fail to upgrade worker machine due to insufficient compute resources
[PASS] When the specified resource does not exist When starting with a healthy cluster Should fail to upgrade control plane machine due to insufficient compute resources
[PASS] When testing affinity group Should have host affinity group when affinity is anti
[PASS] When testing resource cleanup Should create a new network when the specified network does not exist
[PASS] When testing node drain timeout A node should be forcefully removed if it cannot be drained in time
[PASS] When testing Kubernetes version upgrades Should successfully upgrade kubernetes versions when there is a change in relevant fields
[PASS] When testing MachineDeployment rolling upgrades Should successfully upgrade Machines upon changes in relevant MachineDeployment fields
[PASS] When testing app deployment to the workload cluster with slow network [ToxiProxy] Should be able to download an HTML from the app deployed to the workload cluster
Summarizing 1 Failure:
[Fail] When testing affinity group [It] Should have host affinity group when affinity is pro
/jenkins/workspace/capc-e2e-new/test/e2e/common.go:331
Ran 28 of 29 Specs in 8523.486 seconds
FAIL! -- 27 Passed | 1 Failed | 0 Pending | 1 Skipped
--- FAIL: TestE2E (8523.49s)
FAIL
The failing affinity E2E is also failing on main so is not an error introduced by this change.
/unhold
AWS analyzed CAPC in high node count contexts and found it takes considerable time to scale clusters. Part of the issue stems from CloudStackMachine resources being reconciled serially. This change enables concurrent reconciliation of CloudStackMachine resources improving the efficiency and preventing other parts of the system from reacting to slowness.
I have tested these changes by scaling up and down a machine deployment from 1 to 11 nodes. Scale ups took comparable times (55s) vs serial reconciliation which is expected as most of the time is consumed by VM provisioning. Scale down had an 85% improvement from 1m57s to 27s.
Related #274