Open codefromthecrypt opened 8 years ago
cc @eirslett @kristofa @prat0318 @liyichao @anuraaga @basvanbeek @abesto thoughts?
For example, right now, if too many writes go to Elasticsearch, we get logs like this. Seems that in Kafka, we could have an opportunity to just read less..
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TransportService$4@3b438fe4 on EsThreadPoolExecutor[index, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@6ed30c21[Running, pool size = 8, active threads = 8, queued tasks = 200, completed tasks = 72]]
Sounds like a good idea to me!
Just curious, we are thinking modifying just kafka transport because it can handle throttling and none of http/scribe can, correct?
Overall, looks a good idea.
@prat0318 exactly
Hi, any update ?
Here's the current thinking.
So Kafka implies running zookeeper. We have an adaptive sampler which uses zookeeper to store the target rate that the storage layer is capable of. We could re-use this code to rate limit kafka consumers (and/or drop), without adding a new service dependency (because kafka already needs zookeeper).
There's a problem we might have, which is that kafka marks slow consumers dead (I hear).
thoughts?
nows a good time to start working on this, since the elasticsearch code is stabilizing cc @openzipkin/elasticsearch
easiest dead simple start on this is to make storage block the kafka stream loop
Is there something 3rd lib can support Kafka with Zipkin?
If you look at zipkin-aws and zipkin-azure in both cases you can add a custom collector. So yes you could make a third party kafka collector that does backpressure nicely. However it would be even nicer to contribute such a change upstream as all we are missing is hands to help!
On 14 Mar 2017 10:33, "宋鑫" notifications@github.com wrote:
Is there something 3rd lib can support Kafka with Zipkin?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openzipkin/zipkin/issues/1181#issuecomment-286354335, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD610BJdzbqwIHbf0JVZerha96nhgr_ks5rllDegaJpZM4JKK1i .
Kafka will only hand you the messages as fast as you can receive them, so indeed, if backpressure is ever needed we could make a blocking version of the collector.
There's a problem we might have, which is that kafka marks slow consumers dead (I hear).
Indeed, this refers to https://kafka.apache.org/documentation/#max.poll.interval.ms . Default value is 5 minutes though, if the storage write from the latest message poll did not complete within 5 minutes you have other issues mkay.
http and scribe collectors accept requests, returning early. A thread will later persist to storage or error. We need to do this as we don't want to make instrumentation block.
We use the same approach in kafka at the moment, but we might want to reconsider it.
Right now, a surge in writes to elasticsearch will result in errors, as elasticsearch threads drop messages it can't store. Instead of reading more than we know we can store.. why don't we slow down and consume from kafka at the rate we can store?
For example, we could implement backpressure by linking the storage operations to the kafka consumer threads. If storage operations slow down, the kafka queue will build up, something relatively easy to monitor.
Any thoughts on the idea or other alternatives?