openshift / api

Canonical location of the OpenShift API definition.
http://www.openshift.org
Apache License 2.0
94 stars 517 forks source link

OCPBUGS-43745: Add IdleCloseOnResponse field to IngressControllerSpec #2102

Open frobware opened 3 days ago

frobware commented 3 days ago

Introduce a new knob, IdleConnectionTerminationPolicy, in the IngressController configuration to control how idle connections are handled during router reloads.

Context

In OCPBUGS-32044, the idle-close-on-response option was unconditionally added to the HAProxy confuguration to address issues with incoming HTTP requests failing during router reloads. This issue primarily affected Apache HTTPClient versions prior to 5.0, which do not gracefully handle connection resets. Adding the option ensured that idle connections were left open to handle one final request before being closed.

Historically, HAProxy 2.2 maintained idle connections during router reloads by default, allowing requests on those connections to complete even when routing configuration changes were applied. Starting with HAProxy 2.4, the default behaviour changed to close idle connections immediately during soft reloads.

To accommodate existing clients dependent on the HAProxy 2.2 behaviour, the unconditional addition of idle-close-on-response restored the previous OpenShift status quo, where customers upgrading their OpenShift clusters experienced a behaviour change due to the jump from HAProxy 2.2 to 2.6, which altered the default handling of idle connections during router reloads.

However, unconditionally enabling idle-close-on-response has now led to issues (OCPBUGS-43745) with Route backend switching. When a Route switches its service backend, requests on persistent connections could continue being routed to the previously active backend due to HAProxy handling these connections in the old process. This behaviour occurs until the connection is closed, either by a new request, the expiration of the client keep-alive, or the expiration of the HAProxy timeout http-keep-alive 300s. While this behaviour is desirable in some cases (e.g., for clients sensitive to connection resets), it can lead to temporary inconsistencies and unexpected routing behaviour during backend switching.

This PR addresses these regressions by making the behaviour configurable through a new knob.

Changes

Behavioural Differences

References:

openshift-ci[bot] commented 3 days ago

Hello @frobware! Some important instructions when contributing to openshift/api: API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

openshift-ci[bot] commented 3 days ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: frobware Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/openshift/api/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci-robot commented 3 days ago

@frobware: This pull request references Jira Issue OCPBUGS-43745, which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/api/pull/2102): >- **OCPBUGS-43745: make generate-with-container** >- **OCPBUGS-43745: Add IdleCloseOnResponse field to IngressControllerTuningOptions** > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fapi). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
frobware commented 3 days ago

/jira refresh

openshift-ci-robot commented 3 days ago

@frobware: This pull request references Jira Issue OCPBUGS-43745, which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to [this](https://github.com/openshift/api/pull/2102#issuecomment-2483596232): >/jira refresh > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fapi). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 20 hours ago

@frobware: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/integration 84689bf6752251547541a87d3cfb891f9c6add29 link true /test integration

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).