Currently the FlinkFixedPartitioner uses a static mapping i.e. subtask_id % num_partitions and the subtask_id is dependent on the parallelism config specified.
This has 2 problems:
The parallelism is specified for the producer (upstream job) and can be different from the downstream job.
The number of partitions could be different across the upstream and downstream jobs
To avoid skew by not explicitly specifying a partitioner flink will fallback on Kafka producer's default partitioning strategy - which is round-robin.
Currently the FlinkFixedPartitioner uses a static mapping i.e. subtask_id % num_partitions and the subtask_id is dependent on the parallelism config specified.
This has 2 problems:
To avoid skew by not explicitly specifying a partitioner flink will fallback on Kafka producer's default partitioning strategy - which is round-robin.