Closed ccll closed 6 years ago
Did some test, scaled a function deployment to 2 replicas and then only the second (newer) replica receives the topic messages. Is this a bug or expected behavior?
Indeed the current events-based pods follow the queuing model
approach. In the docs we use the 'pubsub' notation to express that the function will be triggered PUBlishing messages to the SUBscribed functions (it doesn't refer to the queueing model, it may be confusing).
Every replica of the same function will use the same group so the messages will be balanced between them. Only one pod will receive the message. This avoids issues when functions are not idempotent.
When checking this I noticed that the NodeJS runtime does not follow the same approach (it doesn't set the groupID
). I will modify it to match the same behavior than the other languages)
If you want to achieve the pubsub model
you still can do it specifying different handlers for the same function (the handler is the field used to generate the groupID). For example:
kubeless function deploy pubsub-python34 --trigger-topic s3-python34 --handler pubsub.handler
kubeless function deploy pubsub-python34-2 --trigger-topic s3-python34 --handler pubsub2.handler
The above will create two different groups so both functions will process messages in the same topic.
Thanks for the detailed explanation, it's much clear now.
I was using a python3.4 runtime, and the messages were indeed send to only 1 of the consumers, but it's stick to 1 specific consumer, rather than round-robin-ed/distributed among all the consumers.
According to your linked doc, if I got it right, kafka achieve load-balance with consumer-group and partition together, partitions are assigned to consumers, and messages are then distributed across partitions.
It seems that the behavior I'm observing is that only 1 partition is created and assigned to a specific consumer, so what ever I do the messages only got send to this specific consumer. Am I right?
Yes, the default behavior is that only one partition (and consumer) receives the message. It is not configured to distribute messages among all the consumers. However if one pod dies, Kafka will keep sending messages to the other pods available.
I am closing this issue since there is no action to take here. Please @ccll reopen it if there is something else that needs clarification/fixing.
@andresmgot I also noticed that all the Kafka messages were sent to one single consumer after autoscaling. This is not the expected behavior of autoscaling, right?
@guangningyu you mean when scaling the function or the controller?
@andresmgot Thanks for your reply - I mean scaling the function. Here is what I did:
kubeless function deploy foo --cpu 4 --handler ...
kubeless trigger kafka create foo --function-selector ... --trigger-topic ...
kubeless autoscale create foo --min 1 --max 4 --metric cpu --value 70
Is this the expected behavior?
I see, yes.
Internally, what kubeless uses is the service endpoint to reach the function (something like my-func.default.svc.cluster.local
). Then, Kubernetes networking decides which pod should get the traffic. If the first pod is not under stress it may receive all the traffic. That may change and it can start rotating requests but it's an "internal" decision.
@andresmgot Got it. I managed to change the mode of kube-proxy from "iptables" to "ipvs" and the new pod starts to receive traffic now. Thanks.
I'm a newbie to kafka messaging and found out in it's docs there are two models of messaging:
Since the docs of kubeless call it's event model as 'pubsub', I guess it uses the 'pubsub model' of kafka, right? Is there any way to achieve the 'queueing model' behavior in kubeless now?