robinhood / faust

Python Stream Processing
Other
6.75k stars 534 forks source link

[Question] why not use aiokafka and kafka-python directly #271

Open europelee opened 5 years ago

europelee commented 5 years ago

I wonder that why faust use own aiokafka and kafka-python forks, so when there are some bug fix or new feature at aiokafka or kafka-python, faust can fix them quickly, such as https://github.com/aio-libs/aiokafka/issues/444.

amerski99 commented 5 years ago

second this

ask commented 5 years ago

This is not currently possible without merging our changes upstream.

We no longer use a custom kafka-python, but for aiokafka we have made some fixes that are absolutely necessary.

If I remember correctly the main changes we have made are:

Kafka has a auto-create topics that kicks in after you have joined the group, but for some reason the kafka client does not request a rejoin and just hangs.

For exactly-once we have modified the producer to support handling more than one transaction at once. If we don't have this we would need to use one producer for every topic partition.

If the worker is subscribed to 10 topics with 100 partitions each that would mean starting 1000 producer instances, each which will have as many sockets open as there are brokers...

suzil commented 5 years ago

We would like to use Faust at my company for our data pipeline, but due to the old fork of aiokafka, some on the team have security concerns to the point where we'll likely roll out a simplified internal library for our microservices instead.

Of course I don't know how difficult it is to merge changes upstream and update to a newer version of aiokafka, but it's a deal-breaker for my company :disappointed:

ask commented 5 years ago

@suzil, the fork is not old, it's up to date with the latest version of aiokafka.

we will be merging it upstream, but not sure when that will happen

TomGoBravo commented 5 years ago

By the way I stumbled on this issue when trying to work out where /srv/venvs/service/trusty/service_venv_python3.6/lib/python3.6/site-packages/aiokafka/init.py on my machine came from. It isn't https://github.com/aio-libs/aiokafka/blob/master/aiokafka/__init__.py or https://github.com/aio-libs/aiokafka/blob/master/aiokafka/__init__.py but https://github.com/robinhood/aiokafka/blob/robinhood4/aiokafka/__init__.py.

suzil commented 5 years ago

Thanks, that makes more sense! I was looking at master branch.