hypertrace / span-normalizer

A streaming job that converts the incoming spans into Hypertrace's raw span format
Apache License 2.0
0 stars 2 forks source link

Don't explicitly set a flink kafka partitioner #4

Closed surajpuvvada closed 4 years ago

surajpuvvada commented 4 years ago

Currently the FlinkFixedPartitioner uses a static mapping i.e. subtask_id % num_partitions and the subtask_id is dependent on the parallelism config specified.

This has 2 problems:

  1. The parallelism is specified for the producer (upstream job) and can be different from the downstream job.
  2. The number of partitions could be different across the upstream and downstream jobs

To avoid skew by not explicitly specifying a partitioner flink will fallback on Kafka producer's default partitioning strategy - which is round-robin.