shotover / shotover-proxy

L7 data-layer proxy
https://docs.shotover.io
Apache License 2.0
83 stars 16 forks source link

Kafka: fix producing with acks=0 #1630

Closed rukai closed 3 months ago

rukai commented 3 months ago

Discovery

This managed to avoid being hit because the test cases were running acks=0 last and never sending any more messages down those connections. But while investigating https://github.com/shotover/shotover-proxy/pull/1588 this started showing up in the acks0 test case presumably because the java driver was reusing the connections it sent the acks=0 produce requests down.

Background info

Codecs are an optional helper mechanism provided by the tokio_util crate. They make it easy to attach an implementation of a protocol to a TCP socket. In shotover every database that we support has a unique codec implemented for its protocol. There is no single Codec type, instead there are separate read and write codecs, called the Encoder and the Decoder.

The kafka protocol does not specify the message type of the responses sent by the kafka broker. Instead it is up to the client to remember the message type of each request it sends out and then match up each response that comes in with its corresponding request so that the response can be correctly decoded.

In order to decode kafka responses, shotover needs the kafka outgoing encoder to instruct the outgoing decoder on how to decode each response. This is done via an mpsc channel, specifically we use the mpsc channel from the standard library.

https://github.com/shotover/shotover-proxy/blob/c1331f8944a1d2b1b005bfd6f728aecfd9b5154c/shotover/src/codec/kafka.rs#L42-L48

The problem

However there is one exception to this that we were not handling! The kafka produce request can set a field named acks to determine how many acks the leader broker must receive before responding to the request. As a special case if acks = 0 then the leader broker will never respond to that request. This is used if latency is the highest concern and lost messages are acceptable.

Shotover will never receive a response for a produce with acks=0. So if the encoder tells the decoder to expect a produce response it will expect the next message to be a produce even though that is not the case. This results in attempting to decode responses as the wrong type and therefore failure to decode.

The fix

The correct fix here is to have the encoder skip sending message type information if acks = 0. Specifically that is done by checking the result of the Message::response_is_dummy() method which checks if the request will not receive a response, including a check for acks = 0 on kafka messages.

Now that the issue is fixed we can enable the failing tests in kafka_int_tests/test_cases.rs.

codspeed-hq[bot] commented 3 months ago

CodSpeed Performance Report

Merging #1630 will improve performances by 16.51%

Comparing rukai:kafka_fix_acks0 (f0f528d) with main (3cdefda)

Summary

⚡ 1 improvements ✅ 36 untouched benchmarks

Benchmarks breakdown

Benchmark main rukai:kafka_fix_acks0 Change
encode_request_produce 25.9 µs 22.2 µs +16.51%
rukai commented 3 months ago

The encode_request_produce microbenchmark above slightly improved performance since acks=0 no longer sends the header info to the decoder.