vmware-archive / kafka-trigger

Kubernetes CRD controller for Kafka topic as event source for Kubeless functions
Apache License 2.0
28 stars 34 forks source link

[Question] kafka-trigger controller implementation tightly coupled with queue message processing? #24

Closed juliohm1978 closed 4 years ago

juliohm1978 commented 4 years ago

I'm in the process of implementing a custom PulsarTrigger for an internal use-case we have, so we can trigger functions based on Pulsar queue events. Given our lack of Golang expertise, we decided to use the kubernetes-client python api.

Now, looking at the Kafka Trigger Controller as an example, I noticed that the message processing is actually inside the controller code, launched as a separate thread. The controller creates a consumer thread for each trigger associated to its function.

I wonder how this works for a large scale messaging system. The code for message processing is tightly coupled in memory with the trigger controller. At large scale, the mere fact of creating a new trigger could easily bring the message processing to a momentary halt.

In this architecture, is it possible to scale triggers horizontally? I'm not sure I see how.

andresmgot commented 4 years ago

Yes, some people have reported issues regarding the scalability of the Kafka trigger and its performance. See:

https://github.com/kubeless/kubeless/issues/826

It's a known issue but we never had the time to implement a better architecture.

juliohm1978 commented 4 years ago

Thank you.

This gives us an insight on how to proceed. I will decouple the controller and implement the message processing as a different pod, launched and managed by the controller. This allows the message processing to scale independently.

If I publish the code on GitHub, would it make sense to use a different Group Name for the CRD?

andresmgot commented 4 years ago

I will decouple the controller and implement the message processing as a different pod, launched and managed by the controller. This allows the message processing to scale independently.

Sounds good :+1:

If I publish the code on GitHub, would it make sense to use a different Group Name for the CRD?

I would keep the group as kubeless.io to tie it to the Kubeless ecosystem but feel free to modify it if you need to.

juliohm1978 commented 4 years ago

I'm dropping v1.0.0 as a first draft over here: https://juliohm1978.github.io/kubeless-pulsar-trigger/

I'm not sure where I can get more people to check it out, besides reddit. Our team is hoping to use this in a some internal projects, so hopefully I can get some feedback to improve it.

Thank you for the help responding these issues!

andresmgot commented 4 years ago

awesome, that was fast :)