Closed jo3bingham closed 4 years ago
I haven't looked at transactions in over a year (maybe closer to two), but unless I'm mistaken, when you call commit, what happens is that the transaction coordinator writes a transaction commit marker to the partition. A transaction commit marker is how a consumer knows whether or not a certain message has actually been committed or is currently part of an open transaction (or one that has been rolled back).
When a consumer reads from the partition, any message that does not have an associated transaction commit marker will not be returned to the user (assuming the client understands transactions and is configured to only read committed messages). So the phantom message you are talking about is that transaction commit marker.
You can verify this by publishing N messages within a single transaction before committing, and you should see the offset incrementing by N+1.
Feel free to re-open if I'm wrong about this, but I'm 95% sure that this is the case. You can read more about how transactions work here.
Following up with #682, I wanted to know why messages sent from our producer had evenly numbered offsets in kafka and why the high-watermark for that topic/partition was always
NumberOfMessages+1
.So I created a simple topic:
Then I created a simple transactional producer, and admin to display watermark:
The first run of the code produced:
The second produced:
The third produced:
As you can see, the offset and high watermark increments by two even though I'm only sending one message.
Next, I modified the code to use the producer to send the message:
The first run of the modified code produced:
The second produced:
The third produced:
Now the offset and high watermark is only incrementing by one, as one would assume.
Consuming the topic from the beginning retrieves six messages (as expected since I sent 3 via transaction and 3 via producer):
Output: