mozilla-services / heka

DEPRECATED: Data collection and processing made easy.
http://hekad.readthedocs.org/
Other
3.4k stars 531 forks source link

Memory map kafka checkpoint file #1950

Open andremedeiros opened 8 years ago

andremedeiros commented 8 years ago

Problem

Heka's KafkaInput plugin writes the offset for every Kafka message. When it's a low volume topic, this doesn't cause any issues, but with high volume ones, disk IO becomes a limiting factor to how quickly Heka can process messages.

Solution

Instead of writing the offset on every message to disk, this PR mmaps it. A quick benchmark shows the advantages of using this method:

BenchmarkFile-8                  10000         119687 ns/op
BenchmarkMmap-8             2000000000           0.64 ns/op

No new tests were written, and the ones that verify checkpoint correctness still pass.

r. @fbogsany @rafrombrc