joowani / kq

Kafka-based Job Queue for Python
http://kq.readthedocs.io
MIT License
572 stars 24 forks source link

Syncrhomising multiple workers with the same queue #9

Closed k53sc closed 6 years ago

k53sc commented 6 years ago

Hello,

I am trying to solve the following use case using 2 workers and a job queue

for simplicity let's call our workers 'worker A' and 'worker B'

Now my problem is when 'worker A' sleeps after printing out messages to console my expectation is 'worker B' should take over which is not happening

queue.enqueue( custom_func, '1')
queue.enqueue( custom_func, '2')
queue.enqueue( custom_func, '3')
queue.enqueue( custom_func, '4')
queue.enqueue( custom_func, '5') # after exectution of this step worker 'a' prints and then sleeps
queue.enqueue( custom_func, '6')
queue.enqueue( custom_func, '7')
queue.enqueue( custom_func, '8')
queue.enqueue( custom_func, '9')
queue.enqueue( custom_func, '10')

I see 'worker a' waking up after 5 seconds and processing rest of stuff

My setup:

My workers are executing the following code

import logging

from kafka import KafkaConsumer
from kq import Worker

# Set up logging.
formatter = logging.Formatter('[%(levelname)s] %(message)s')
stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
logger = logging.getLogger('kq.worker')
logger.setLevel(logging.DEBUG)
logger.addHandler(stream_handler)

# Set up a Kafka consumer.
consumer = KafkaConsumer(
    bootstrap_servers='127.0.0.1:9092',
    group_id='group',
    auto_offset_reset='latest'
)

# Set up a worker.
worker = Worker(topic='topic', consumer=consumer)
worker.start()

I have two terminals running two processes of which I assume will play the role of 'worker a' and 'worker b'

joowani commented 6 years ago

Hi @k53sc,

This may be due to your Kafka configuration. How many partitions do you have set up for your topic "topic"? You should have at least as many partitions as you have number of workers if you want them to work in parallel.

k53sc commented 6 years ago

okay, Is it absolute requirement to have

at least as many partitions as you have number of workers if you want them to work in parallel.

I want to add workers at run-time, so I don't know how many partition will I need when I start, is there any way all workers can share the same partition? I would want all the workers to pick up data from a common partition.

Can kafka be used for these kind of use-cases ? Do you have any suggestions?

Thanks in Advance.

joowani commented 6 years ago

Yes it is a requirement by design (not of KQ but of Kafka itself). The question you are asking: how many partitions should I use? is a popular one. You probably need to do your own research but the most frequent answer I've seen on the web is "it depends on your usecase". Here is a start.

KQ worker is essentially a light wrapper on top of Kafka consumer. I suggest you first gain a solid grasp of how Kafka consumers, partitions and brokers work (your questions seem to suggest that you are new to the concept) before using KQ.

Cheers.

k53sc commented 6 years ago

Thanks @joowani , let me build some understanding around the concept.

Thanks for your help