knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.54k stars 1.15k forks source link

Add pod diagnostics before scaling down to zero in scaler #15326

Open skonto opened 3 months ago

skonto commented 3 months ago

Fixes #14157

Proposed Changes

Steps to reproduce. First run the ksvc, let it scale down to zero and then remove the revision image from the local registry. Disable net access so image cannot be fetched, issue a new request.
The status of the Serving resources will become:

{ "apiVersion": "v1", "items": [ { "apiVersion": "serving.knative.dev/v1", "kind": "Service", "metadata": { "annotations": { .... }, "status": { "address": { "url": "http://hello.default.svc.cluster.local" }, "conditions": [ { "lastTransitionTime": "2024-06-12T13:02:06Z", "message": "Revision \"hello-00001\" failed with message: Initial scale was never achieved.", "reason": "RevisionFailed", "status": "False", "type": "ConfigurationsReady" }, { "lastTransitionTime": "2024-06-12T13:02:05Z", "message": "Revision \"hello-00001\" failed to become ready.", "reason": "RevisionMissing", "status": "False", "type": "Ready" }, { "lastTransitionTime": "2024-06-12T13:02:05Z", "message": "Revision \"hello-00001\" failed to become ready.", "reason": "RevisionMissing", "status": "False", "type": "RoutesReady" } ], } } { "apiVersion": "v1", "items": [ { "apiVersion": "serving.knative.dev/v1", "kind": "Revision", "metadata": { "annotations": { "serving.knative.dev/creator": "minikube-user", "serving.knative.dev/progress-deadline": "45s", "serving.knative.dev/routes": "hello", "serving.knative.dev/routingStateModified": "2024-06-12T12:57:33Z" }, ...

        "status": {
            "actualReplicas": 0,
            "conditions": [
                {
                    "lastTransitionTime": "2024-06-12T13:02:06Z",
                    "message": "The target is not receiving traffic.",
                    "reason": "NoTraffic",
                    "severity": "Info",
                    "status": "False",
                    "type": "Active"
                },
                {
                    "lastTransitionTime": "2024-06-12T12:57:51Z",
                    "status": "True",
                    "type": "ContainerHealthy"
                },
                {
                    "lastTransitionTime": "2024-06-12T13:02:06Z",
                    "message": "Initial scale was never achieved",
                    "reason": "ProgressDeadlineExceeded",
                    "status": "False",
                    "type": "Ready"
                },
                {
                    "lastTransitionTime": "2024-06-12T13:02:06Z",
                    "message": "Initial scale was never achieved",
                    "reason": "ProgressDeadlineExceeded",
                    "status": "False",
                    "type": "ResourcesAvailable"
                }
            ],

... } { "apiVersion": "v1", "items": [ { "apiVersion": "serving.knative.dev/v1", "kind": "Configuration", "metadata": { ... "name": "hello", "namespace": "default", ... "status": { "conditions": [ { "lastTransitionTime": "2024-06-12T13:02:06Z", "message": "Revision \"hello-00001\" failed with message: Initial scale was never achieved.", "reason": "RevisionFailed", "status": "False", "type": "Ready" } ], ... } { "apiVersion": "v1", "items": [ { "apiVersion": "autoscaling.internal.knative.dev/v1alpha1", "kind": "PodAutoscaler",

... "spec": { "protocolType": "http1", "reachability": "Reachable", "scaleTargetRef": { "apiVersion": "apps/v1", "kind": "Deployment", "name": "hello-00001-deployment" } }, "status": { "actualScale": 0, "conditions": [ { "lastTransitionTime": "2024-06-12T13:02:05Z", "message": "The target is not receiving traffic.", "reason": "NoTraffic", "status": "False", "type": "Active" }, { "lastTransitionTime": "2024-06-12T13:02:05Z", "message": "The target is not receiving traffic.", "reason": "NoTraffic", "status": "False", "type": "Ready" }, { "lastTransitionTime": "2024-06-12T12:58:51Z", "message": "K8s Service is not ready", "reason": "NotReady", "status": "Unknown", "type": "SKSReady" }, { "desiredScale": 0, "metricsServiceName": "hello-00001-private", "observedGeneration": 1, "serviceName": "hello-00001" } } ], }

After we bring the image back a new request will work as expected and resource statuses go back to the usual.
**Release Note**

<!-- Enter your extended release note in the below block. If the PR requires
additional action from users switching to the new release, include the string
"action required". If no release note is required, write "NONE". -->

```release-note
codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 21.21212% with 26 lines in your changes missing coverage. Please review.

Project coverage is 84.60%. Comparing base (62ce45c) to head (248d6e8). Report is 144 commits behind head on main.

Files with missing lines Patch % Lines
pkg/reconciler/autoscaling/kpa/scaler.go 21.05% 13 Missing and 2 partials :warning:
pkg/resources/pods.go 0.00% 7 Missing :warning:
pkg/apis/autoscaling/v1alpha1/pa_lifecycle.go 0.00% 2 Missing :warning:
pkg/reconciler/autoscaling/kpa/kpa.go 60.00% 1 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #15326 +/- ## ========================================== - Coverage 84.76% 84.60% -0.16% ========================================== Files 218 218 Lines 13504 13534 +30 ========================================== + Hits 11447 11451 +4 - Misses 1690 1713 +23 - Partials 367 370 +3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

skonto commented 3 months ago
error: the server doesn't have a resource type "ksvc"
skonto commented 3 months ago

/retest

knative-prow[bot] commented 3 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: skonto Once this PR has been reviewed and has the lgtm label, please ask for approval from dprotaso. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/knative/serving/blob/main/OWNERS)~~ [skonto] - **[pkg/apis/OWNERS](https://github.com/knative/serving/blob/main/pkg/apis/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
skonto commented 3 months ago

@dprotaso gentle ping.

skonto commented 3 months ago

@dprotaso gentle ping.

github-actions[bot] commented 5 days ago

This Pull Request is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen with /reopen. Mark as fresh by adding the comment /remove-lifecycle stale.

skonto commented 3 days ago

/remove-lifecycle stale