Test long running requests in Serving

skonto commented 1 week ago

Fixes JIRA #

Proposed Changes

Test for #3017 #3016
Introduces the serving.knative.openshift.io/setRouteTimeout ksvc annotation, so that the user can set the timeout per route. This is aligned with the fact that a ksvc can set a timeout per revision. We don't use the haproxy annotation directly since it might change in the future and in an upgrade scenario we don't want to update the ksvc annotations.
Using mesh mode since RHAI uses that. Also we avoid running the test with every PR since it takes > 10minutes.

Tested with Kourier locally:


go test -v -failfast -timeout=30m -parallel=1 ./test/servinge2e/servicemesh/longrunning --kubeconfigs=...
=== RUN   TestTimeoutForLongRunningRequests
time="2024-11-19T23:34:32+02:00" level=info msg="Loading kube client config from path \"/.....\""
W1119 23:34:33.007228  193306 warnings.go:70] Kubernetes default value is insecure, Knative may default this to secure in a future release: spec.template.spec.containers[0].securityContext.allowPrivilegeEscalation, spec.template.spec.containers[0].securityContext.capabilities, spec.template.spec.containers[0].securityContext.runAsNonRoot, spec.template.spec.containers[0].securityContext.seccompProfile
spoof.go:111: Spoofing longrunning-serverless-tests.apps.ci-ln-glppxsb-76ef8.aws-2.ci.openshift.org -> longrunning-serverless-tests.apps.ci-ln-glppxsb-76ef8.aws-2.ci.openshift.org
service.go:61: Cleaning up Knative Service 'serverless-tests/longrunning'
--- PASS: TestTimeoutForLongRunningRequests (642.17s)
PASS
ok      github.com/openshift-knative/serverless-operator/test/servinge2e/servicemesh/longrunning    642.195s

skonto commented 1 week ago

/test ?

openshift-ci[bot] commented 1 week ago

@skonto: The following commands are available to trigger required jobs:

/test 413-images
/test 413-operator-e2e-aws-413
/test 413-test-upgrade-aws-413
/test 417-aws-ovn-images
/test 417-azure-images
/test 417-gcp-images
/test 417-hypershift-images
/test 417-images
/test 417-operator-e2e-aws-417
/test 417-osd-images
/test 417-single-node-images
/test 417-test-upgrade-aws-417
/test 417-vsphere-images
/test 418-images
/test 418-operator-e2e-aws-418
/test 418-test-upgrade-aws-418
/test ocp4.17-lp-rosa-classic-images
/test ocp4.18-lp-interop-images

The following commands are available to trigger optional jobs:

/test 413-kitchensink-e2e-aws-413
/test 413-kitchensink-upgrade-aws-413
/test 413-mesh-e2e-aws-413
/test 413-mesh-upgrade-aws-413
/test 413-test-soak-aws-413
/test 413-ui-e2e-aws-413
/test 413-upstream-e2e-aws-413
/test 413-upstream-e2e-kafka-aws-413
/test 417-kitchensink-e2e-aws-417
/test 417-kitchensink-upgrade-aws-417
/test 417-mesh-e2e-aws-417
/test 417-mesh-upgrade-aws-417
/test 417-test-soak-aws-417
/test 417-ui-e2e-aws-417
/test 417-upstream-e2e-aws-417
/test 417-upstream-e2e-kafka-aws-417
/test 418-kitchensink-e2e-aws-418
/test 418-kitchensink-upgrade-aws-418
/test 418-mesh-e2e-aws-418
/test 418-mesh-upgrade-aws-418
/test 418-test-soak-aws-418
/test 418-ui-e2e-aws-418
/test 418-upstream-e2e-aws-418
/test 418-upstream-e2e-kafka-aws-418

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-knative-serverless-operator-main-417-images
pull-ci-openshift-knative-serverless-operator-main-417-operator-e2e-aws-417
pull-ci-openshift-knative-serverless-operator-main-417-test-upgrade-aws-417
pull-ci-openshift-knative-serverless-operator-main-417-upstream-e2e-aws-417
pull-ci-openshift-knative-serverless-operator-main-417-upstream-e2e-kafka-aws-417

In response to [this](https://github.com/openshift-knative/serverless-operator/pull/3038#issuecomment-2488180469): >/test ? Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

skonto commented 1 week ago

/test 417-mesh-e2e-aws-417

skonto commented 1 week ago

11:44:26.122 SUCCESS: ðŸŒŸ Tests have passed ðŸŒŸ
ingresscontroller.operator.openshift.io/default patched
knativeserving.operator.knative.dev/knative-serving patched
Running go test with args: -race -count=1 -tags=e2e -failfast -timeout=60m -parallel=1 ./test/servinge2e/servicemesh/longrunning --kubeconfigs /tmp/kubeconfig-1609998190,/go/src/github.com/openshift-knative/serverless-operator/user1.kubeconfig,/go/src/github.com/openshift-knative/serverless-operator/user2.kubeconfig,/go/src/github.com/openshift-knative/serverless-operator/user3.kubeconfig --imagetemplate {{- with .Name }}
{{- if eq . "httpproxy" }}quay.io/openshift-knative/serving/{{.}}:v1.15
{{- else if eq . "autoscale" }}quay.io/openshift-knative/serving/{{.}}:v1.15
{{- else if eq . "recordevents" }}quay.io/openshift-knative/eventing/{{.}}:v1.15
{{- else if eq . "wathola-forwarder" }}quay.io/openshift-knative/eventing/{{.}}:v1.15
{{- else if eq . "kafka" }}quay.io/strimzi/kafka:latest-kafka-3.4.0
{{- else }}quay.io/openshift-knative/{{.}}:multiarch{{end -}}
{{end -}}
PASS test/servinge2e/servicemesh/longrunning.TestTimeoutForLongRunningRequests (640.28s)
PASS test/servinge2e/servicemesh/longrunning

skonto commented 1 week ago

upstream mesh tests fail:

spoof.go:181: Retrying https://readiness-alternate-port-icngkohe-serving-tests.apps.serverless-ocp-4-17-amd64-aws-us-east-1-5nwb2.serverless.devcluster.openshift.com: retrying for certificate signed by unknown authority: Get "https://readiness-alternate-port-icngkohe-serving-tests.apps.serverless-ocp-4-17-amd64-aws-us-east-1-5nwb2.serverless.devcluster.openshift.com": tls: failed to verify certificate: x509: certificate signed by unknown authority

skonto commented 1 week ago

/retest

skonto commented 1 week ago

/test 417-test-upgrade-aws-417

skonto commented 1 week ago

Mesh tests passed, including the test in this PR. If upgrade tests don't pass I will updated the PR.

skonto commented 1 week ago

Ready to be merged.

skonto commented 1 week ago

@matzew hi, could you stamp this one?

mgencur commented 1 week ago

Hey @skonto , I didn't think this test was supposed to be merged since it takes a long time. Would it be possible to instead decrease all the values? This would have the same effect. IMO:

max-revision-timeout-seconds - 180s
revision-response-start-timeout-seconds - 170s
revision-timeout-seconds - 180
routeTimeout - 180s
sleepTime - 120000 (120s) Then you would not have to increase the LoadBalancer timeouts in AWS and the test would run faster. My modified version of the test uses those values above and it works.

Aside from that, should we use autoscaling.TargetBurstCapacityKey: "-1", for the service to make sure Activator is on the path?

skonto commented 1 week ago

@mgencur It is only in the mesh tests which we run on demand (put it there intentionally). The idea was to mimic what people asked in the ticket, a timeout beyond the default (600).

Aside from that, should we use autoscaling.TargetBurstCapacityKey: "-1", for the service to make sure Activator is on the path?

With default values in the ksvc (like in this case) and when we scale from zero activator is always on the path. We would only need that if we do load testing.

mgencur commented 1 week ago

We also run this periodically, like here: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-knative-serverless-operator-main-413-mesh-e2e-aws-413-c/1859764914193698816 Anyway, it's not a big concern. I suppose it's fine for the lack of time to have the long test now. Maybe in the future shorter values could be used because:

There's already a unit test that verifies the setting for OpenShift route is propagated
There could be another short test that would verify that exceeding the timeout settings for Route, and Revision will actually make the request fail. Both these tests could run under 2 minutes. Just some notes

mgencur commented 1 week ago

/lgtm

openshift-ci[bot] commented 1 week ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mgencur, skonto

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift-knative/serverless-operator/blob/main/OWNERS)~~ [mgencur,skonto] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment

skonto commented 1 week ago

@mgencur ok let's revise this for 1.36 :)

openshift-knative / serverless-operator

Test long running requests in Serving #3038

Proposed Changes