knative-extensions / net-kourier

Purpose-built Knative Ingress implementation using just Envoy with no additional CRDs
Apache License 2.0
298 stars 83 forks source link

Kourier at large scale #941

Open daraghlowe opened 2 years ago

daraghlowe commented 2 years ago

What's the issue? We have started testing Kourier at large scale to see if deployment times are better than Istio(time for a KSVC to become ready to serve traffic). Deploy times are good with Kourier and it consistently takes less than 10 seconds for a newly added KSVC to become ready all the way up to 2000 KSVC.

However, if you delete a KSVC and then you try to add a new KSVC, times are much slower and even with only 500 KSVC on the cluster it takes several minutes before the new KSVC is ready.

Looking at the logs in the net-kourier-controller, you can see that it starts reconciling all of the Ingress on the cluster when you delete a KSVC and presumably this needs to finish before the new ingress can be created for our new KSVC.

Why is this a problem? This leads to inconsistent deploy times for our workloads which creates an inconsistent user experience as sometimes its really quick and other times it could takes minutes to become ready.

Results Here are the times it took for a single KSVC to become ready right after I deleted a different single KVSC alongside the number of KSVC that were on the cluster.

image

Why are we doing this? We are running a cluster with Knative and Istio with 1500 KSVC and have started to run a problem with the time it's taking before new KSVC we add become ready (the ingress).

We opened an issue for this here: https://github.com/knative/serving/issues/13247

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

dprotaso commented 1 year ago

/lifecycle frozen

norbjd commented 1 year ago

Hello :wave:,

We have noticed this kind of inconsistent deploy times on our Knative clusters too. As of now, this is the main reason we're running multiple clusters, with every cluster having only between 400 and 500 ksvc. Above, we start seeing some slowness, mostly ksvc taking a while to be ready.

I've started investigating based on @daraghlowe's example, on a simple kind cluster, measuring the time for the kingress resources to become ready. Here's my experiment, with the latest kourier version (main as of 2023-07-22: 85c062d):

  1. create a single Pod and a Service
  2. create 2000 Ingresses sequentially, all pointing on the Service created in the first step
  3. delete the first Ingress
  4. create 10 new Ingresses sequentially

For the first ingresses (up to 1200), 95% of the time, it takes less than 1s for every ingress to become ready. But, when we have more ingresses, this time increases, up to 2 seconds. The more ingresses objects we have, the more time it takes for an ingress to be ready, but it's always less or equal 2 seconds, so it's not that bad. See this plot showing the percentage of ingresses creation taking between 1 and 2 seconds according to the number of total ingresses:

image

These results might be normal, I don't know the intricacies.

But, once I had 2000 ingresses, and after deleting an existing ingress (3rd step), the next ingress creation (4th step) took between 7 and 8 seconds. The next ones were consistent with the results I showed before, between 1 and 2 seconds.

I'm still not sure why we got that big "time-to-ready" duration increase (from 1-2s to 7-8s) just after deleting an ingress, but from an outside perspective, adding an ingress should always take the same time to be ready.

I could not reproduce @daraghlowe numbers, because I only focused on ingresses here; there are obviously other things configured when we create a ksvc (revision, configuration, etc.).

I'll continue to investigate, but I thought it was worth posting this first experiment as it could bring more discussions.