openshift / cluster-etcd-operator

Operator to manage the lifecycle of the etcd members of an OpenShift cluster
Apache License 2.0
96 stars 130 forks source link

ETCD-349: add etcd cachedmembers #1208

Closed Elbehery closed 8 months ago

Elbehery commented 8 months ago

resolves https://issues.redhat.com/browse/ETCD-349

openshift-ci-robot commented 8 months ago

@Elbehery: This pull request references ETCD-349 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/openshift/cluster-etcd-operator/pull/1208): >resolves https://issues.redhat.com/browse/ETCD-349 > > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-etcd-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
dusk125 commented 8 months ago

So since pretty much all external functions call the refresh function (which write-locks the mutex), it's very unlikely that this will be able to properly use the read mutexes. So essentially you have a write-lock mutex on every function.

I wonder if it would be better to start a goroutine that write-locks and updates the cache (calling refresh) on a timer instead; since you're only updating the cache after a certain period. That way, each of the external functions would just read-lock and could actually access the data at the same time.

tjungblu commented 8 months ago

@dusk125 alternatively, I think it could be a normal mutex. This is not necessary a performance improvement, merely an attempt at reducing the amount of RPC calls to etcd. The remote calls take much longer than the write lock would be taken (just a few instructions to check whether it would need a refresh).

Timer is also cool with me, but needs to have some additional error handling to bubble up when a cache is known to be outdated by more than a minute.

dusk125 commented 8 months ago

Sounds good Thomas, then a normal mutex would be good, keep it simple! :D

Elbehery commented 8 months ago

/label tide/merge-method-squash

Elbehery commented 8 months ago

/retest-required

Elbehery commented 8 months ago

/retest-required

Elbehery commented 8 months ago

/retest-required

Elbehery commented 8 months ago

/retest-required

Elbehery commented 8 months ago

/retest-required

Elbehery commented 8 months ago

the failing jobs are due to other failures, not related to this PR

/retest-required

Elbehery commented 8 months ago

/retest-required

Elbehery commented 8 months ago

/retest-required

Elbehery commented 8 months ago

@tjungblu ci/prow/e2e-operator-fips failure not related to this PR

shall we override ?

tjungblu commented 8 months ago

why? it's not required, or is it? :rofl:

Elbehery commented 8 months ago

why? it's not required, or is it? 🤣

doch doch, it is required .. but the failure has nothing to do with this change https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-etcd-operator/1208/pull-ci-openshift-cluster-etcd-operator-master-e2e-operator-fips/1764984733453783040

Elbehery commented 8 months ago

/retest-required

Elbehery commented 8 months ago

/retest-required

tjungblu commented 8 months ago

/retest-required

Elbehery commented 8 months ago

/retest-required

tjungblu commented 8 months ago

/override ci/prow/e2e-operator-fips

openshift-ci[bot] commented 8 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Elbehery, tjungblu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-etcd-operator/blob/master/OWNERS)~~ [Elbehery,tjungblu] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci[bot] commented 8 months ago

@tjungblu: Overrode contexts on behalf of tjungblu: ci/prow/e2e-operator-fips

In response to [this](https://github.com/openshift/cluster-etcd-operator/pull/1208#issuecomment-1983767761): >/override ci/prow/e2e-operator-fips Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
openshift-ci-robot commented 8 months ago

/retest-required

Remaining retests: 0 against base HEAD 2f6e1dec53a275fb2d11049203986cd79d8c747d and 2 for PR HEAD 62dcc73a23c18444b6813ab080efea5e99f748ec in total

Elbehery commented 8 months ago

/retest-required

Elbehery commented 8 months ago

/retest-required

Elbehery commented 8 months ago

/retest-required

Elbehery commented 8 months ago

/retest-required

Elbehery commented 8 months ago

/retest-required

Elbehery commented 8 months ago

/retest-required

Elbehery commented 8 months ago

/retest-required

Elbehery commented 8 months ago

/retest-required

openshift-ci[bot] commented 8 months ago

@Elbehery: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-etcd-recovery 62dcc73a23c18444b6813ab080efea5e99f748ec link false /test e2e-aws-etcd-recovery
ci/prow/e2e-gcp-qe-no-capabilities 62dcc73a23c18444b6813ab080efea5e99f748ec link false /test e2e-gcp-qe-no-capabilities

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
tjungblu commented 8 months ago

/override ci/prow/e2e-agnostic-ovn-upgrade

tjungblu commented 8 months ago

/override ci/prow/e2e-operator-fips

openshift-ci[bot] commented 8 months ago

@tjungblu: Overrode contexts on behalf of tjungblu: ci/prow/e2e-agnostic-ovn-upgrade

In response to [this](https://github.com/openshift/cluster-etcd-operator/pull/1208#issuecomment-1988259561): >/override ci/prow/e2e-agnostic-ovn-upgrade Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
openshift-ci[bot] commented 8 months ago

@tjungblu: Overrode contexts on behalf of tjungblu: ci/prow/e2e-operator-fips

In response to [this](https://github.com/openshift/cluster-etcd-operator/pull/1208#issuecomment-1988259835): >/override ci/prow/e2e-operator-fips Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
openshift-bot commented 8 months ago

[ART PR BUILD NOTIFIER]

This PR has been included in build cluster-etcd-operator-container-v4.16.0-202403111814.p0.gfd67e61.assembly.stream.el9 for distgit cluster-etcd-operator. All builds following this will include this PR.