openshift-kni / lifecycle-agent

Local agent for orchestration of SNO Image Based Upgrade
Apache License 2.0
6 stars 26 forks source link

MGMT-17652: Possibly improve etcd performance through defrag/compaction #523

Closed tsorya closed 1 month ago

tsorya commented 1 month ago

Background / Context

When reconfiguring a seed image, we have to perform a lot of etcd delete operations. This leaves etcd is a fragmented state, which could possibly harm its performance when its later actually used for OCP and thus lengthen the IBX duration. Running defrag command allows us to minimize defrag state in etcd db as example

Without running defrag on seed i got 68% if difference between db size and size in use [root@seed core]# oc exec etcd-seed -c etcd -n openshift-etcd -- etcdctl endpoint status --cluster -w json | jq '.[].Status|"dbSize: " + (.dbSize|tostring) + ", dbSizeInUse: " + (.dbSizeInUse|tostring) + ", (dbSize-dbSizeInUse)/dbSize => " + ((.dbSize - .dbSizeInUse)/.dbSize*100|tostring)+"%"' "dbSize: 107253760, dbSizeInUse: 33562624, (dbSize-dbSizeInUse)/dbSize => 68.70727515753295%" With running defrag on seed creation , i got 0% difference [root@seed core]# oc exec etcd-seed -c etcd -n openshift-etcd -- etcdctl endpoint status --cluster -w json | jq '.[].Status|"dbSize: " + (.dbSize|tostring) + ", dbSizeInUse: " + (.dbSizeInUse|tostring) + ", (dbSize-dbSizeInUse)/dbSize => " + ((.dbSize - .dbSizeInUse)/.dbSize*100|tostring)+"%"' "dbSize: 83079168, dbSizeInUse: 83079168, (dbSize-dbSizeInUse)/dbSize => 0%"

Issue / Requirement / Reason for change

Solution / Feature Overview

Running etcd defrag command after running recert on seed creation

Implementation Details

Other Information

openshift-ci[bot] commented 1 month ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

openshift-ci-robot commented 1 month ago

@tsorya: This pull request references MGMT-17652 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.17.0" version, but no target version was set.

In response to [this](https://github.com/openshift-kni/lifecycle-agent/pull/523): > > ># Background / Context > When reconfiguring a seed image, we have to perform a lot of etcd delete operations. This leaves etcd is a fragmented state, which could possibly harm its performance when its later actually used for OCP and thus lengthen the IBX duration. >Running defrag command allows us to minimize defrag state in etcd db as example > >Without running defrag on seed i got 68% if difference between db size and size in use >`[root@seed core]# oc exec etcd-seed -c etcd -n openshift-etcd -- etcdctl endpoint status --cluster -w json | jq '.[].Status|"dbSize: " + (.dbSize|tostring) + ", dbSizeInUse: " + (.dbSizeInUse|tostring) + ", (dbSize-dbSizeInUse)/dbSize => " + ((.dbSize - .dbSizeInUse)/.dbSize*100|tostring)+"%"' >"dbSize: 107253760, dbSizeInUse: 33562624, (dbSize-dbSizeInUse)/dbSize => 68.70727515753295%" >` >With running defrag on seed creation , i got 0% difference >`[root@seed core]# oc exec etcd-seed -c etcd -n openshift-etcd -- etcdctl endpoint status --cluster -w json | jq '.[].Status|"dbSize: " + (.dbSize|tostring) + ", dbSizeInUse: " + (.dbSizeInUse|tostring) + ", (dbSize-dbSizeInUse)/dbSize => " + ((.dbSize - .dbSizeInUse)/.dbSize*100|tostring)+"%"' >"dbSize: 83079168, dbSizeInUse: 83079168, (dbSize-dbSizeInUse)/dbSize => 0%" >` > > > > ># Issue / Requirement / Reason for change > > > ># Solution / Feature Overview >Running etcd defrag command after running recert on seed creation > > > > ># Implementation Details > > > ># Other Information > > > > > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift-kni%2Flifecycle-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 1 month ago

@hexfusion: changing LGTM is restricted to collaborators

In response to [this](https://github.com/openshift-kni/lifecycle-agent/pull/523#pullrequestreview-2068695452): > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: donpenney, hexfusion

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift-kni/lifecycle-agent/blob/main/OWNERS)~~ [donpenney] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
donpenney commented 1 month ago

thanks @hexfusion @tsorya