knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.57k stars 1.16k forks source link

Theoretical maximum number of Knative Services for a single cluster #13201

Open rhuss opened 2 years ago

rhuss commented 2 years ago

Ask your question here:

According to Kubernetes Scalability thresholds there is an upper limit for the number of Kubernetes Services that is based on the maximum number of iptable entries on a node. Currently, this limit is at 10000 services (if I understand correctly, this is independent of the number of nodes in a cluster since every node needs to have the same iptables routing).

Since every Knative service translates at a minimum to 3 Kubernetes services (1 ExternalName service pointing to the ingress gateway, and 2 services for each revision (public/private), the theoretical maximum of Knative Services in a cluster would be ~ 3350 Knative services (and much less if using multiple revisions and/or other workloads on this cluster beside Knative).

My questions would be:

rhuss commented 2 years ago

For reference, I found this very interesting blog post which points to this presentation on how to Scaling Kubernetes to Support 50000 Services. It uses IPVS. No idea whether IPVS is supported by Kubernetes out of the box.

psschwei commented 2 years ago

I think it is (though I don't have any experience with it): https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-ipvs

porcelli commented 2 years ago

This is a significant limitation for the Knative serverless use case. If you consider that in a serverless environment, you may have ~5% of registered services active, you end up with 95% of the cluster idle. I would love to hear how others experience scale with the Knative serverless.

ggaaooppeenngg commented 2 years ago

In some environments, the clusters are initialized with limit service CIDR besides the limit of iptables. The number of services is even smaller than 10000. If we support headless services, we can get rid of this limitation. I propose to offer an option in the configuration to turn on headless in order to not assign an ip to service.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

rhuss commented 1 year ago

/remove-lifecycle stale

dprotaso commented 1 year ago

Kubernetes Services that is based on the maximum number of iptable entries on a node

If this is accurate then I think for our private k8s service we can set the ClusterIP: None. That private service is used for endpoint collection/tracking.

It's worth exploring in v1.12

dprotaso commented 1 year ago

So it could be that you have an empty cluster (everything scaled to zero), but your routing tables are still exhausted ?

Networking programming is the slowest so it's a trade-off between keeping these resources and cold starts.

Another optimization could be to spin down child resources for Revisions that are not reachable.

zetaab commented 10 months ago

this is also interesting https://render.com/blog/knative

it seems that render did some fixes themselves and removed the services. I am just wondering could that be supported somehow for big knative clusters? Like we do have similar situation: we are always using loadbalancers (and ingress controller in front of), so 2 services at least are useless?

dprotaso commented 9 months ago

it seems that render did some fixes themselves and removed the services

Render were able to do this because their free tier only runs one pod per Knative Service - which is per tenant.

The private services are a means to collect endpoints of revision pods that we will use in the public service. The public service also is where we wire in the activator when the pods scale to zero or you need extra burst capacity.

Alternatively Knative could do the endpoint collection itself but then we're copying Kubernetes behaviour - unsure if we want to go down that path unless there's a lot of benefit.

I think dropping the ClusterIP would help a lot - someone just needs to open a PR and test it out. @zetaab are there other concerns with the private Knative Service that you have?

ggaaooppeenngg commented 9 months ago

If the backend endpoint scale is single or the client side can handle the load balancing, maybe the ClusterIP of public service can be dropped like the Render. What about making the private service ClusterIP to None and create an option for the public service to be headless?

dprotaso commented 9 months ago

The public service is already headless and we manage those endpoints ourselves.

What about making the private service ClusterIP to None

I was suggesting this here - https://github.com/knative/serving/issues/13201#issuecomment-1680608428

Someone just needs to do that work :)

ggaaooppeenngg commented 9 months ago

@dprotaso If headless means manually managing the endpoints, I think it is. But the ClusterIP of public service is not None, and it will consume a service ip if I understand correctly.

dprotaso commented 9 months ago

But the ClusterIP of public service is not None, and it will consume a service ip if I understand correctly.

Yeah I refering to setting the ClusterIP to None for the Private service

dprotaso commented 5 months ago

@rhuss see @izabelacg's change - we disabled the clusterip on the private service. I believe for now it shouldn't be consuming any iptables entries. Can you confirm?

With that change we technically are just using the k8s service for endpoint collection only.

Would it be a desiarable goal to reduce the number of Kubernetes Services attached to a Knative Service? (like down to 1:1) ?

I don't think so - k8s services are considered the 'frontend' for routing so we'll always need one per revision.

We could remove the private service and do the collection ourselves - but that adds a ton of complexity to Serving and I'm not sure it's worth it at the moment.

rhuss commented 5 months ago

Thanks a lot; that is definitely a vast improvement. Is there a way to indicate that some revisions should not be routable ? E.g. if you don't leverage a traffic split or don't want allow access to older revision (e.g. when they contain some bugs that are resolved by newer revisions). If so, I think that would be perfect because then we can always argue, if you "just" want autoscaling without any revisioning, then you consume as many services as without autoscaling (aka when using a vanilla K8s Deployment & Service)

dprotaso commented 4 months ago

Note - we had to revert the cluster ip changes because it broke the autoscaler pod/cluster ip scraping