operator-framework / ansible-operator-plugins

Experimental extraction/refactoring of the Operator SDK's ansible operator plugin
Apache License 2.0
7 stars 17 forks source link

Poor performance of Ansible operator after upgrade to v1.31.0 #17

Open jtruskow opened 9 months ago

jtruskow commented 9 months ago

Type of question

General operator-related help

Question

Poor performance of Ansible operator v1.31.0

What did you do?

Upon updating the operator to the latest version (v1.31.0) I'm seeing a serious performance degradation. The reconcile loop takes ~5x longer to complete compared to v1.30.0

Everything remains the same, except for the operator SDK version (and changing to Python3.9). I followed the upgrade guide here: https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.31.0/

I don't think it's a bug in the operator-sdk because I installed the memcached operator using both v1.30.0 and v1.31.0 on my cluster and they perform similarly. I'm hoping to get some advice on how to debug this further.

I've attached a file showing a diff of runtime for each task. They are pretty consistently slower (there is not one or a small number of tasks responsible for the slowdown)

sdkv1.31.0_speech_timediffs.txt

What did you expect to see?

Similar runtimes across tasks

What did you see instead? Under which circumstances?

Significant slowdown on latest version (v1.31.0)

Environment

Operator type:

/language ansible

Kubernetes cluster type:

Openshift 4.10 and Openshift 4.12

$ operator-sdk version

operator-sdk version: "v1.31.0", commit: "e67da35ef4fff3e471a208904b2a142b27ae32b1", kubernetes version: "1.26.0", go version: "go1.19.11", GOOS: "darwin", GOARCH: "arm64"

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.10+8c21020", GitCommit:"379f6fe03321f9149edea7f20e11ce88f8d99c25", GitTreeState:"clean", BuildDate:"2023-06-12T16:07:59Z", GoVersion:"go1.19.9", Compiler:"gc", Platform:"linux/amd64"}

Additional context

everettraven commented 9 months ago

@jtruskow Thanks for raising this issue! v1.31.0 of the ansible-operator base image uses Ansible 2.15.0 instead of 2.9.z. Because I'm not familiar with what performance impacts there may be with this, I reached out to some folks that I know that are involved with the Ansible project and they mentioned that https://github.com/ansible/ansible/pull/81643 may be a culprit here and that this fix is expected to release as part of Ansible 2.15.5 on October 9th.

jtruskow commented 9 months ago

@everettraven Awesome! That seems like a likely culprit. I assume once that is fixed upstream, we'll need to wait for another operator-sdk release to pick it up.

I wasn't able to find a roadmap, is there any plan in the works for 1.31.1 or 1.32.0?

everettraven commented 9 months ago

There is a v1.32.0 release in the works, but I don't have an ETA as to when that release is coming or when it might include this fixed ansible version

openshift-ci[bot] commented 9 months ago

@jtruskow: The label(s) language/ansible cannot be applied, because the repository doesn't have them.

In response to [this](https://github.com/operator-framework/ansible-operator-plugins/issues/17): > > >## Type of question > > > > > >General operator-related help > > >## Question >Poor performance of Ansible operator v1.31.0 > >#### What did you do? > >Upon updating the operator to the latest version (v1.31.0) I'm seeing a serious performance degradation. The reconcile loop takes ~5x longer to complete compared to v1.30.0 > >Everything remains the same, except for the operator SDK version (and changing to Python3.9). I followed the upgrade guide here: https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.31.0/ > >I don't think it's a bug in the operator-sdk because I installed the memcached operator using both v1.30.0 and v1.31.0 on my cluster and they perform similarly. I'm hoping to get some advice on how to debug this further. > >I've attached a file showing a diff of runtime for each task. They are pretty consistently slower (there is not one or a small number of tasks responsible for the slowdown) > >[sdkv1.31.0_speech_timediffs.txt](https://github.com/operator-framework/operator-sdk/files/12652490/sdkv1.31.0_speech_timediffs.txt) > > > >#### What did you expect to see? > >Similar runtimes across tasks > >#### What did you see instead? Under which circumstances? > >Significant slowdown on latest version (v1.31.0) > >#### Environment > >**Operator type:** > > > > >/language ansible > > >**Kubernetes cluster type:** > >Openshift 4.10 and Openshift 4.12 > >`$ operator-sdk version` > >`operator-sdk version: "v1.31.0", commit: "e67da35ef4fff3e471a208904b2a142b27ae32b1", kubernetes version: "1.26.0", go version: "go1.19.11", GOOS: "darwin", GOARCH: "arm64"` > >`$ kubectl version` >``` >Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"darwin/arm64"} >Kustomize Version: v4.5.7 >Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.10+8c21020", GitCommit:"379f6fe03321f9149edea7f20e11ce88f8d99c25", GitTreeState:"clean", BuildDate:"2023-06-12T16:07:59Z", GoVersion:"go1.19.9", Compiler:"gc", Platform:"linux/amd64"} >``` >#### Additional context > > > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.