operator-framework / ansible-operator-plugins

Experimental extraction/refactoring of the Operator SDK's ansible operator plugin
Apache License 2.0
7 stars 17 forks source link

Poor performance of Ansible operator after upgrade to v1.31.0 #17

Open jtruskow opened 9 months ago

jtruskow commented 9 months ago

Type of question

General operator-related help


Poor performance of Ansible operator v1.31.0

What did you do?

Upon updating the operator to the latest version (v1.31.0) I'm seeing a serious performance degradation. The reconcile loop takes ~5x longer to complete compared to v1.30.0

Everything remains the same, except for the operator SDK version (and changing to Python3.9). I followed the upgrade guide here: https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.31.0/

I don't think it's a bug in the operator-sdk because I installed the memcached operator using both v1.30.0 and v1.31.0 on my cluster and they perform similarly. I'm hoping to get some advice on how to debug this further.

I've attached a file showing a diff of runtime for each task. They are pretty consistently slower (there is not one or a small number of tasks responsible for the slowdown)


What did you expect to see?

Similar runtimes across tasks

What did you see instead? Under which circumstances?

Significant slowdown on latest version (v1.31.0)


Operator type:

/language ansible

Kubernetes cluster type:

Openshift 4.10 and Openshift 4.12

$ operator-sdk version

operator-sdk version: "v1.31.0", commit: "e67da35ef4fff3e471a208904b2a142b27ae32b1", kubernetes version: "1.26.0", go version: "go1.19.11", GOOS: "darwin", GOARCH: "arm64"

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.10+8c21020", GitCommit:"379f6fe03321f9149edea7f20e11ce88f8d99c25", GitTreeState:"clean", BuildDate:"2023-06-12T16:07:59Z", GoVersion:"go1.19.9", Compiler:"gc", Platform:"linux/amd64"}

Additional context

everettraven commented 9 months ago

@jtruskow Thanks for raising this issue! v1.31.0 of the ansible-operator base image uses Ansible 2.15.0 instead of 2.9.z. Because I'm not familiar with what performance impacts there may be with this, I reached out to some folks that I know that are involved with the Ansible project and they mentioned that https://github.com/ansible/ansible/pull/81643 may be a culprit here and that this fix is expected to release as part of Ansible 2.15.5 on October 9th.

jtruskow commented 9 months ago

@everettraven Awesome! That seems like a likely culprit. I assume once that is fixed upstream, we'll need to wait for another operator-sdk release to pick it up.

I wasn't able to find a roadmap, is there any plan in the works for 1.31.1 or 1.32.0?

everettraven commented 9 months ago

There is a v1.32.0 release in the works, but I don't have an ETA as to when that release is coming or when it might include this fixed ansible version

openshift-ci[bot] commented 9 months ago

@jtruskow: The label(s) language/ansible cannot be applied, because the repository doesn't have them.

In response to [this](https://github.com/operator-framework/ansible-operator-plugins/issues/17): > > >## Type of question > > > > > >General operator-related help > > >## Question >Poor performance of Ansible operator v1.31.0 > >#### What did you do? > >Upon updating the operator to the latest version (v1.31.0) I'm seeing a serious performance degradation. The reconcile loop takes ~5x longer to complete compared to v1.30.0 > >Everything remains the same, except for the operator SDK version (and changing to Python3.9). I followed the upgrade guide here: https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.31.0/ > >I don't think it's a bug in the operator-sdk because I installed the memcached operator using both v1.30.0 and v1.31.0 on my cluster and they perform similarly. I'm hoping to get some advice on how to debug this further. > >I've attached a file showing a diff of runtime for each task. They are pretty consistently slower (there is not one or a small number of tasks responsible for the slowdown) > >[sdkv1.31.0_speech_timediffs.txt](https://github.com/operator-framework/operator-sdk/files/12652490/sdkv1.31.0_speech_timediffs.txt) > > > >#### What did you expect to see? > >Similar runtimes across tasks > >#### What did you see instead? Under which circumstances? > >Significant slowdown on latest version (v1.31.0) > >#### Environment > >**Operator type:** > > > > >/language ansible > > >**Kubernetes cluster type:** > >Openshift 4.10 and Openshift 4.12 > >`$ operator-sdk version` > >`operator-sdk version: "v1.31.0", commit: "e67da35ef4fff3e471a208904b2a142b27ae32b1", kubernetes version: "1.26.0", go version: "go1.19.11", GOOS: "darwin", GOARCH: "arm64"` > >`$ kubectl version` >``` >Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"darwin/arm64"} >Kustomize Version: v4.5.7 >Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.10+8c21020", GitCommit:"379f6fe03321f9149edea7f20e11ce88f8d99c25", GitTreeState:"clean", BuildDate:"2023-06-12T16:07:59Z", GoVersion:"go1.19.9", Compiler:"gc", Platform:"linux/amd64"} >``` >#### Additional context > > > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.