project-codeflare / rayvens

Rayvens makes it possible for data scientists to access hundreds of data services within Ray with little effort.
Apache License 2.0
43 stars 7 forks source link

Add consumer scaling for Kafka transport for sources #27

Closed doru1004 closed 2 years ago

doru1004 commented 2 years ago

Add consumer scaling for Kafka transport for sources.

This means that when using the Kafka transport with a partitioned topic and the number of partitions in the topic is greater than 1, the number of started Kafka consumers ingesting events from the event source will be equal to the number of partitions in the Kafka topic (used by the Kafka transport).

The confluent Kafka consumers are wrapped in a KafkaConsumer Ray actor. KafkaConsumer actor is set to use a minimal percentage of a CPU i.e. 0.05. The KafkaConsumer actor is meant to be a thin, low compute intensity wrapper to the operations performed by the subscribers. When running in a cluster the number of KafkaConsumer actors will be adjusted by Ray depending on the number of available resources.

The subscribers are fixed at the moment the source is added to the stream.