pipelinedb / pipeline_kafka

PipelineDB extension for Kafka support
Other
61 stars 22 forks source link

pipeline_kafka rewrite #42

Open usmanm opened 7 years ago

usmanm commented 7 years ago

We're going to keep the 0.8.2.2 branch alive for Kafka 0.8. The master branch is only going to be compatible with Kafka 0.9+.

The new pipleline_kafka will have have a completely new API. Some notes about the implementation and features:

Unanswered questions:

@derekjn: Thoughts?

derekjn commented 7 years ago

This all sounds good to me. Regarding replay, I think an ephemeral bgworker (or maybe a group of them) will work just fine for that.

The other important thing I think we need to address is packaging. The ZK client library is a huge pain to install on non-Ubuntu systems. There just happens to be an apt repo for zookeeper_mt. Building and installing librdkafka is easy and clean, but zookeeper_mt is not so I don't feel great about deferring that complexity to users.

Some options here:

The main thing is that adding the ZK dependency means it is no longer reasonable for us to expect users to build and install pipeline_kafka themselves.

usmanm commented 7 years ago

We might have to bump this up since currently the way our paralellism works will cause a latency of parallelism * timeout. This is because each partition is consumed independently and we wait for timeout when polling each partition. Super low timeouts burn a lot of CPU so that's not a fix.

usmanm commented 7 years ago

For replay, we can probably start off by just running the replay code in the client process.

simplesteph commented 7 years ago

More suggestions:

usmanm commented 7 years ago

Thanks for the suggestions @simplesteph! Will incorporate them when writing the new client.

maver1ck commented 6 years ago

Hi, What about Avro support here ?