wbarnha / kafka-python-ng

Fork for Python client for Apache Kafka
https://wbarnha.github.io/kafka-python-ng/
Apache License 2.0
67 stars 8 forks source link

The serialization layer is unexpectedly processed before the producer's partitioning logic #112

Open wbarnha opened 6 months ago

wbarnha commented 6 months ago

sample code pulled from one of our internal applications:

# kafka_producer is configured with:
#    "key_serializer": json.dumps,
#    "value_serializer": json.dumps,

key = None  # None produces round-robin
if Const.FIELD_USER in message:
    key = message[Const.FIELD_USER]
kafka_producer.send(topic, key=key, value=message)

Unsurprisingly, using json.dumps will serialize key=None to 'null'.

Surprisingly, this results in key=None behaving as if it were a keyed message and always being sent to a single partition rather than round-robining.

This is because the serialization layer is processed before the partitioning logic. So by the time https://github.com/dpkp/kafka-python/blob/1.4.4/kafka/partitioner/default.py#L24 is hit, the key is already the string 'null'.

I found this extremely surprising... at a minimum we need to call this out in the docs.

Alternatively, we could offer default helpers that handle null keys/values (for deleting messages in compacted topics) in a less surprising way.

Related: https://github.com/dpkp/kafka-python/issues/913.