knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.57k stars 1.16k forks source link

[WIP] fix: http probe downgrades from http2 to http1 when it encounters an error #15436

Open Cali0707 opened 3 months ago

Cali0707 commented 3 months ago

Fixes #15432

Proposed Changes

Release Note

knative-prow[bot] commented 3 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Cali0707 Once this PR has been reviewed and has the lgtm label, please assign dprotaso for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/knative/serving/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
Cali0707 commented 3 months ago

/cc @dprotaso

I made a start here, but I think I am still missing something...

Specifically, if the error on the options request is because the service isn't ready yet we should try the options request again. However, I wasn't sure the best way to do that and/or figure that out. Maybe it makes sense to track how many times the options request fails before we actually set the HTTP version on the probe config? That way, it would retry the options request a few times

WDYT?

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 25.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 84.66%. Comparing base (222065d) to head (cd205ce). Report is 100 commits behind head on main.

Files with missing lines Patch % Lines
pkg/queue/health/probe.go 25.00% 3 Missing and 3 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #15436 +/- ## ========================================== + Coverage 84.54% 84.66% +0.11% ========================================== Files 219 219 Lines 13595 13600 +5 ========================================== + Hits 11494 11514 +20 + Misses 1734 1710 -24 - Partials 367 376 +9 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Cali0707 commented 3 months ago

I've tried to update it so that until a probe succeeds for either http1 or http2, we keep retrying the options request. This seems better, but there is still a chance of a race condition where:

  1. the options request fails because the service is not accepting requests yet
  2. the http1 probe we try succeeds because now the service started accepting requests
github-actions[bot] commented 2 weeks ago

This Pull Request is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen with /reopen. Mark as fresh by adding the comment /remove-lifecycle stale.