Open tpiperatgod opened 2 years ago
and with the new scaler, FunctionMesh can downscale the replicas of a function to 0
+1 on this, was also thinking about using KEDA when, talking about the relationship between size and spinup duration / faster dynamic scaling in the advantages of distroless topic https://github.com/streamnative/function-mesh/issues/448
regarding KEDA: this is a good introduction: https://medium.com/backstagewitharchitects/how-autoscaling-works-in-kubernetes-why-you-need-to-start-using-keda-b601b483d355 (the embedded video is also interesting)
there is already a blogpost saying that KEDA may be a future direction (at the end of https://streamnative.cn/blog/engineering/2022-01-19-auto-scaling-pulsar-functions-in-kubernetes-using-custom-metrics-zh/)
of course in some/many usecases the possibility to easily autoscale to zero would help a lot in the field of infrastructure costs...
Function Mesh's function instances can be dynamically scaled with the help of HPA based on CPU and memory metrics. However, Function Mesh has not yet been able to scale to/from 0 replica. This proposal aims to provide a solution that can implement this feature.
Provides the ability to scale the function instances of Function Mesh to/from 0 replica.
I propose to introduce the KEDA project as a basic solution for implementing the scaling of Function Mesh's function instances to/from 0 replica. The advantage of this solution is that Function Mesh's event engine is Pulsar, and KEDA already has a Pulsar scaler, which can use Pulsar's message backlog as a metric for function scaling.
Structure for scaling configurations:
type AdvanceScaleConfig struct {
Driver string `json:"driver,omitempty"` \\ Indicates the driver for Scaler, available: "keda"
Topics []string `json:"topics,omitempty"` \\ Indicates the topics used to trigger the Scaler
Strategy map[string]string `json:"strategy,omitempty"` \\ Indicates the trigger strategy
}
Example:
spec:
advanceScaleConfig:
driver: keda
topics:
- persistent://public/default/my-topic-1
- persistent://public/default/my-topic-2
strategy:
msgBacklogThreshold: 10
activationMsgBacklogThreshold: 2
pollingInterval: 30
According to the definition of KEDA Pulsar Scaler, a Scaler is triggered by only one Topic, so if there are multiple Topics in the Function (spec.inputs
), the Operator will generate a Trigger for each Topic.
Example of KEDA ScaledObject resource for the above configuration:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: <function-name>-scaler
namespace: <function-namespace>
spec:
scaleTargetRef:
name: <function-sts-name>
pollingInterval: 30
triggers:
- type: pulsar
metadata:
adminURL: http://localhost:80 # Get from spec.pulsar.pulsarConfig
topic: persistent://public/default/my-topic-1
subscription: sub1 # Get from spec.SubscriptionName
msgBacklogThreshold: '10'
activationMsgBacklogThreshold: '2'
- type: pulsar
metadata:
adminURL: http://localhost:80 # Get from spec.pulsar.pulsarConfig
topic: persistent://public/default/my-topic-2
subscription: sub1 # Get from spec.SubscriptionName
msgBacklogThreshold: '10'
activationMsgBacklogThreshold: '2'
Example configuration of the Auth section, if the following is configured in Function:
spec:
pulsar:
tlsConfig:
enabled: true
allowInsecure: true
certSecretName: "ca-name"
certSecretKey: "ca-key"
Example of resources corresponding to KEDA:
apiVersion: v1
kind: Secret
metadata:
name: <function-name>-keda-tls-secrets
namespace: <function-namespace>
data:
cert: "ca-name"
key: "ca-key"
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: <function-name>-keda-trigger-auth-pulsar-credential
namespace: <function-namespace>
spec:
secretTargetRef:
- parameter: cert
name: <function-name>-keda-tls-secrets
key: cert
- parameter: key
name: <function-name>-keda-tls-secrets
key: key
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: <function-name>-scaler
namespace: <function-namespace>
spec:
scaleTargetRef:
name: <function-sts-name>
pollingInterval: 30
triggers:
- type: pulsar
metadata:
tls: "enable"
adminURL: https://localhost:8443
topic: persistent://public/default/my-topic
subscription: sub1
msgBacklogThreshold: '5'
authenticationRef:
name: <function-name>-keda-trigger-auth-pulsar-credential
state-machine-diagram is here
of course in some/many usecases the possibility to easily autoscale to zero would help a lot in the field of infrastructure costs...
Hi @hpvd, it seems you are interested in this development, may I take the liberty to ask what company you work for? Also, what kind of cases are you using Function Mesh in?
@tpiperatgod thanks for your question. We are still incubating our new company ;-) It's in the field of mechanical engineering... We are looking into pulsar for streaming but also for high-load, on demand batch processing. Because of the latter and the fact that we and our customers don't (always) work 24/7, scaling to zero is more than nice to have... (yes we could work with crons, but this not flexible and the amount of rules always keeps growing..) Beside this, we are interested in a strong security of everything and of course the main features of pulsar -like great performance, build in geo-replication and functions, relative low effort for constant maintenance ...
@tpiperatgod thanks for your question. We are still incubating our new company ;-) It's in the field of mechanical engineering... We are looking into pulsar for streaming but also for high-load, on demand batch processing. Because of the latter and the fact that we and our customers don't (always) work 24/7, scaling to zero is more than nice to have... (yes we could work with crons, but this not flexible and the amount of rules always keeps growing..) Beside this, we are interested in a strong security of everything and of course the main features of pulsar -like great performance, build in geo-replication and functions, relative low effort for constant maintenance ...
Oh, I see. So for now you're worried about two things.
And the community is working on these issues.
You are welcome to participate in building the community
thanks for your warm words. Yes, there was a lot of great progress and there are many good things on the way... e.g.
and also
these 2 points may be interesting for testing and release of this new functionality:
Previously in KEDA, when scaling from 0 to 1, KEDA would “activate” (scale to 1) a resource when any activity happened on that event source. For example, if using a queue, a single message on the queue would trigger activation and scale.
As of this release, we now allow you to set an activationThreshold for many scalers which is the metric that must be hit before scaling to 1.
This would allow you to delay scaling up to 1 until n number of messages were unprocessed. This pairs with other thresholds and target values for scaling from 1 to n instances, where the HPA will scale out to n instances based on the current event metric and the defined threshold values.
Details on thresholds and the new activation thresholds can be found in the KEDA concept docs
see https://keda.sh/blog/2022-08-10-keda-2.8.0-release/
(but not sure if this will happen) see https://github.com/kedacore/keda/blob/main/ROADMAP.md
Keda 2.9 was released: https://github.com/kedacore/keda/blob/main/CHANGELOG.md#v291
The autoscaling of FunctionMesh's resources is currently controlled by HPA.
We can add some Pulsar unique metrics to the HPA to determine if the target workload needs to be scaled.
Here are two approaches:
what do you think?