Transactions with rack awareness

freef4ll commented 2 years ago

Who is this for and what problem do they have today?

Rack awareness with transactions looks currently not working, debug logs generated by librdkafka for a cluster running enable_transactions: true and enable_rack_awareness: true makes transactions work more flakey than without rack awareness.

Steps to reproduce:

REDPANDA_BROKERS=redpanda-host:9092

git clone https://github.com/edenhill/librdkafka.git
cd librdkafka && ./configure && make

rpk topic create input-for-transaction
rpk topic create transaction-output --partitions 6 --replicas 3

# Produces number of repeating digits for the transaction reader to sum up, here we generate a message 5 bytes long: "1 1 1 ", increase '-s 5' for larger messages
./examples/rdkafka_performance  -b ${REDPANDA_BROKERS} -P -t input-for-transaction -m "1 " -s 5  -r 1

# Read input and generate, convert unix timestamp to local timestamp and save the log
./examples/transactions $REDPANDA_BROKERS input-for-transaction transaction-output  2>&1 |  perl -pe 's/([\d]{10})/localtime $1/eg;' | tee -a transactions_$(date +"%Y%m%d_%H%M")

Why is solving this problem impactful?

Distributed applications will make use of both features together to guarantee data correctness and availability of what was stored.

freef4ll commented 2 years ago

Failures, consumer group creating failing:

%7|Tue Sep  6 17:08:12 2022.168|RECV|rdkafka#producer-1| [thrd:TxnCoordinator]: TxnCoordinator/3: Received InitProducerIdResponse (v1, 16 bytes, CorrId 4, rtt 20.68ms)
%7|Tue Sep  6 17:08:12 2022.168|GETPID|rdkafka#producer-1| [thrd:main]: TxnCoordinator/3: Failed to acquire PID: Broker: Producer attempted a transactional operation in an invalid state
%7|Tue Sep  6 17:08:12 2022.168|IDEMPSTATE|rdkafka#producer-1| [thrd:main]: Idempotent producer state change WaitPID -> RequestPID
%4|Tue Sep  6 17:08:12 2022.168|GETPID|rdkafka#producer-1| [thrd:main]: Failed to acquire transactional PID from broker TxnCoordinator/3: Broker: Producer attempted a transactional operation in an invalid state: retrying
%7|Tue Sep  6 17:08:12 2022.168|TXN|rdkafka#producer-1| [thrd:main]: Starting PID FSM timer: Failed to acquire transactional PID from broker TxnCoordinator/3: Broker: Producer attempted a transactional operation in an invalid state

Unable to write to anything because the following warning is recorded:

Sep  6 14:46:15 redpanda-5 rpk[278920]: WARN  2022-09-06 16:46:15,934 [shard 0] tx - tx_gateway_frontend.cc:776 - got error cluster::tx_errc:19 on rolling previous tx.id={librdkafka_transactions_example_16807} with status=3
Sep  6 14:46:16 redpanda-5 rpk[278920]: WARN  2022-09-06 16:46:16,470 [shard 0] tx - tx_gateway_frontend.cc:776 - got error cluster::tx_errc:19 on rolling previous tx.id={librdkafka_transactions_example_16807} with status=3
Sep  6 14:46:16 redpanda-5 rpk[278920]: WARN  2022-09-06 16:46:16,982 [shard 0] tx - tx_gateway_frontend.cc:776 - got error cluster::tx_errc:19 on rolling previous tx.id={librdkafka_transactions_example_16807} with status=3
Sep  6 14:46:17 redpanda-5 rpk[278920]: WARN  2022-09-06 16:46:17,519 [shard 0] tx - tx_gateway_frontend.cc:776 - got error cluster::tx_errc:19 on rolling previous tx.id={librdkafka_transactions_example_16807} with status=3
Sep  6 14:46:18 redpanda-5 rpk[278920]: WARN  2022-09-06 16:46:18,041 [shard 0] tx - tx_gateway_frontend.cc:776 - got error cluster::tx_errc:19 on rolling previous tx.id={librdkafka_transactions_example_16807} with status=3
Sep  6 14:46:18 redpanda-5 rpk[278920]: WARN  2022-09-06 16:46:18,568 [shard 0] tx - tx_gateway_frontend.cc:776 - got error cluster::tx_errc:19 on rolling previous tx.id={librdkafka_transactions_example_16807} with status=3

Adjusting expiration to 5 minutes doesn't help to abort this transaction, lock to purge is acquired but the group is not purged:

PROPERTY                        PRIOR      NEW
transactional_id_expiration_ms  604800000  300000

Sep  6 15:00:17 redpanda-4 rpk[278488]: TRACE 2022-09-06 17:00:17,015 [shard 1] tx - tx_gateway_frontend.cc:44 - got_lock name:expire_old_tx, tx_id:{librdkafka_transactions_example_16807}
Sep  6 15:00:17 redpanda-4 rpk[278488]: TRACE 2022-09-06 17:00:17,015 [shard 1] tx - tx_gateway_frontend.cc:48 - released_lock name:expire_old_tx, tx_id:{librdkafka_transactions_example_16807}
Sep  6 15:02:47 redpanda-4 rpk[278488]: TRACE 2022-09-06 17:02:47,019 [shard 1] tx - tx_gateway_frontend.cc:44 - got_lock name:expire_old_tx, tx_id:{librdkafka_transactions_example_16807}
Sep  6 15:02:47 redpanda-4 rpk[278488]: TRACE 2022-09-06 17:02:47,019 [shard 1] tx - tx_gateway_frontend.cc:48 - released_lock name:expire_old_tx, tx_id:{librdkafka_transactions_example_16807}

Sep  6 15:05:17 redpanda-4 rpk[278488]: TRACE 2022-09-06 17:05:17,026 [shard 1] tx - tx_gateway_frontend.cc:44 - got_lock name:expire_old_tx, tx_id:{librdkafka_transactions_example_16807}
Sep  6 15:05:17 redpanda-4 rpk[278488]: TRACE 2022-09-06 17:05:17,026 [shard 1] tx - tx_gateway_frontend.cc:48 - released_lock name:expire_old_tx, tx_id:{librdkafka_transactions_example_16807}
Sep  6 15:07:47 redpanda-4 rpk[278488]: TRACE 2022-09-06 17:07:47,031 [shard 1] tx - tx_gateway_frontend.cc:44 - got_lock name:expire_old_tx, tx_id:{librdkafka_transactions_example_16807}
Sep  6 15:07:47 redpanda-4 rpk[278488]: TRACE 2022-09-06 17:07:47,032 [shard 1] tx - tx_gateway_frontend.cc:48 - released_lock name:expire_old_tx, tx_id:{librdkafka_transactions_example_16807}

EDIT: only way managed to clean this up is by wipe of the entire clusters data; but reproduced again and created https://github.com/redpanda-data/redpanda/issues/6327

freef4ll commented 2 years ago

Steps to reproduce

rpk cluster config set enable_rack_awareness true
Delete topic, re-create so that replication policy based on rack information is created
Start rdkafka_performance and transactions as per case description

%7|Wed Sep  7 09:31:50 2022.294|TOPPAR|rdkafka#producer-1| [thrd:10.142.65.91:9092/3]: 10.142.65.91:9092/3: transaction-output [2] 1 message(s) in xmit queue (1 added from partition queue)
%7|Wed Sep  7 09:31:50 2022.294|TXNAPI|rdkafka#producer-1| [thrd:app]: Transactional API called: send_offsets_to_transaction (in txn state InTransaction, idemp state Assigned)
%7|Wed Sep  7 09:31:50 2022.294|SEND|rdkafka#producer-1| [thrd:TxnCoordinator]: TxnCoordinator/4: Sent AddOffsetsToTxnRequest (v0, 99 bytes @ 0, CorrId 105)
%7|Wed Sep  7 09:31:50 2022.299|TOPPAR|rdkafka#producer-1| [thrd:10.142.65.91:9092/3]: 10.142.65.91:9092/3: transaction-output [2] 1 message(s) in xmit queue (0 added from partition queue)
%7|Wed Sep  7 09:31:50 2022.299|PRODUCE|rdkafka#producer-1| [thrd:10.142.65.91:9092/3]: 10.142.65.91:9092/3: transaction-output [2]: Produce MessageSet with 1 message(s) (71 bytes, ApiVersion 7, MsgVersion 2, MsgId 628, BaseSeq 627, PID{Id:1,Epoch:4}, uncompressed)
%7|Wed Sep  7 09:31:50 2022.299|SEND|rdkafka#producer-1| [thrd:10.142.65.91:9092/3]: 10.142.65.91:9092/3: Sent ProduceRequest (v7, 173 bytes @ 0, CorrId 50)
%7|Wed Sep  7 09:31:50 2022.307|RECV|rdkafka#producer-1| [thrd:TxnCoordinator]: TxnCoordinator/4: Received AddOffsetsToTxnResponse (v0, 6 bytes, CorrId 105, rtt 12.20ms)
%7|Wed Sep  7 09:31:50 2022.307|ADDOFFSETS|rdkafka#producer-1| [thrd:main]: AddOffsetsToTxn response from TxnCoordinator/4: NO_ERROR ()
%7|Wed Sep  7 09:31:50 2022.307|SEND|rdkafka#producer-1| [thrd:10.142.65.91:9092/3]: 10.142.65.91:9092/3: Sent TxnOffsetCommitRequest (v3, 192 bytes @ 0, CorrId 51)
%7|Wed Sep  7 09:31:50 2022.308|RECV|rdkafka#producer-1| [thrd:10.142.65.91:9092/3]: 10.142.65.91:9092/3: Received ProduceResponse (v7, 62 bytes, CorrId 50, rtt 8.87ms)
%7|Wed Sep  7 09:31:50 2022.308|REQERR|rdkafka#producer-1| [thrd:10.142.65.91:9092/3]: 10.142.65.91:9092/3: ProduceRequest failed: Broker: Not leader for partition: actions Refresh,MsgNotPersisted
%7|Wed Sep  7 09:31:50 2022.308|MSGSET|rdkafka#producer-1| [thrd:10.142.65.91:9092/3]: 10.142.65.91:9092/3: transaction-output [2]: MessageSet with 1 message(s) (MsgId 628, BaseSeq 627) encountered error: Broker: Not leader for partition (actions Refresh,MsgNotPersisted)
%7|Wed Sep  7 09:31:50 2022.308|BROKERUA|rdkafka#producer-1| [thrd:10.142.65.91:9092/3]: transaction-output [2]: broker unavailable: produce: Broker: Not leader for partition
%7|Wed Sep  7 09:31:50 2022.308|FASTQUERY|rdkafka#producer-1| [thrd:10.142.65.91:9092/3]: Starting fast leader query
%7|Wed Sep  7 09:31:50 2022.308|DRAIN|rdkafka#producer-1| [thrd:10.142.65.91:9092/3]: transaction-output [2] beginning partition drain: drain before retrying
%7|Wed Sep  7 09:31:50 2022.308|TOPPAR|rdkafka#producer-1| [thrd:10.142.65.91:9092/3]: 10.142.65.91:9092/3: transaction-output [2] all in-flight requests drained from queue
%7|Wed Sep  7 09:31:50 2022.308|TOPPAR|rdkafka#producer-1| [thrd:10.142.65.91:9092/3]: 10.142.65.91:9092/3: transaction-output [2] 1 message(s) in xmit queue (1 added from partition queue)
%7|Wed Sep  7 09:31:50 2022.318|RECV|rdkafka#producer-1| [thrd:10.142.65.91:9092/3]: 10.142.65.91:9092/3: Received TxnOffsetCommitResponse (v3, 38 bytes, CorrId 51, rtt 10.90ms)
%7|Wed Sep  7 09:31:50 2022.318|TOPPAR|rdkafka#producer-1| [thrd:10.142.65.91:9092/3]: 10.142.65.91:9092/3: transaction-output [2] 1 message(s) in xmit queue (0 added from partition queue)
%7|Wed Sep  7 09:31:50 2022.318|TXNAPI|rdkafka#producer-1| [thrd:app]: Transactional API called: commit_transaction (begin) (in txn state InTransaction, idemp state Assigned)
%7|Wed Sep  7 09:31:50 2022.318|TXNSTATE|rdkafka#producer-1| [thrd:main]: Transaction state change InTransaction -> BeginCommit
%7|Wed Sep  7 09:31:50 2022.318|TXNCOMMIT|rdkafka#producer-1| [thrd:app]: Flushing 58 outstanding message(s) prior to commit
%7|Wed Sep  7 09:31:50 2022.318|WAKEUP|rdkafka#producer-1| [thrd:app]: 10.142.65.91:9092/3: Wake-up: flushing
%7|Wed Sep  7 09:31:50 2022.318|WAKEUP|rdkafka#producer-1| [thrd:app]: 10.142.65.124:9092/1: Wake-up: flushing
%7|Wed Sep  7 09:31:50 2022.318|WAKEUP|rdkafka#producer-1| [thrd:app]: 10.142.65.152:9092/4: Wake-up: flushing
%7|Wed Sep  7 09:31:50 2022.318|WAKEUP|rdkafka#producer-1| [thrd:app]: 10.142.65.14:9092/0: Wake-up: flushing
%7|Wed Sep  7 09:31:50 2022.318|WAKEUP|rdkafka#producer-1| [thrd:app]: TxnCoordinator/4: Wake-up: flushing

Topic migration happens:

%7|Wed Sep  7 09:31:50 2022.573|METADATA|rdkafka#producer-1| [thrd:main]: 10.142.65.91:9092/3:   Topic #0/1: transaction-output with 6 partitions
%7|Wed Sep  7 09:31:50 2022.573|METADATA|rdkafka#producer-1| [thrd:main]:   Topic transaction-output partition 0 Leader 1
%7|Wed Sep  7 09:31:50 2022.573|METADATA|rdkafka#producer-1| [thrd:main]:   Topic transaction-output partition 1 Leader 1
%7|Wed Sep  7 09:31:50 2022.573|METADATA|rdkafka#producer-1| [thrd:main]:   Topic transaction-output partition 2 Leader 2
%7|Wed Sep  7 09:31:50 2022.573|TOPICUPD|rdkafka#producer-1| [thrd:main]: Topic transaction-output [2]: migrating from broker 3 to 2 (leader is 2): leader updated
%7|Wed Sep  7 09:31:50 2022.573|BRKDELGT|rdkafka#producer-1| [thrd:main]: transaction-output [2]: delegate to broker 10.142.65.197:9092/2 (rktp 0x14f834e00, term 0, ref 4)
%7|Wed Sep  7 09:31:50 2022.573|BRKDELGT|rdkafka#producer-1| [thrd:main]: transaction-output [2]: no longer delegated to broker 10.142.65.91:9092/3
%7|Wed Sep  7 09:31:50 2022.573|BRKDELGT|rdkafka#producer-1| [thrd:main]: transaction-output [2]: delegating to broker 10.142.65.197:9092/2 for partition with 0 messages (0 bytes) queued
%7|Wed Sep  7 09:31:50 2022.573|BRKMIGR|rdkafka#producer-1| [thrd:main]: Migrating topic transaction-output [2] 0x14f834e00 from 10.142.65.91:9092/3 to 10.142.65.197:9092/2 (sending PARTITION_LEAVE to 10.142.65.91:9092/3)
%7|Wed Sep  7 09:31:50 2022.573|METADATA|rdkafka#producer-1| [thrd:main]:   Topic transaction-output partition 3 Leader 0
%7|Wed Sep  7 09:31:50 2022.573|METADATA|rdkafka#producer-1| [thrd:main]:   Topic transaction-output partition 4 Leader 4
%7|Wed Sep  7 09:31:50 2022.573|METADATA|rdkafka#producer-1| [thrd:main]:   Topic transaction-output partition 5 Leader 3

%7|Wed Sep  7 09:31:50 2022.636|REQERR|rdkafka#producer-1| [thrd:10.142.65.197:9092/2]: 10.142.65.197:9092/2: ProduceRequest failed: Broker: Producer attempted an operation with an old epoch: explicit actions Permanent,MsgNotPersisted
%7|Wed Sep  7 09:31:50 2022.636|MSGSET|rdkafka#producer-1| [thrd:10.142.65.197:9092/2]: 10.142.65.197:9092/2: transaction-output [2]: MessageSet with 1 message(s) (MsgId 628, BaseSeq 627) encountered error: Broker: Producer attempted an operation with an old epoch (actions Permanent,MsgNotPersisted)
%1|Wed Sep  7 09:31:50 2022.636|TXNERR|rdkafka#producer-1| [thrd:10.142.65.197:9092/2]: Fatal transaction error: ProduceRequest for transaction-output [2] with 1 message(s) failed: Broker: Producer attempted an operation with an old epoch (broker 2 PID{Id:1,Epoch:4}, base seq 627): transactional producer fenced by newer producer instance (_FENCED)
%7|Wed Sep  7 09:31:50 2022.636|FATAL|rdkafka#producer-1| [thrd:10.142.65.197:9092/2]: Fatal error: Local: This instance has been fenced by a newer instance: ProduceRequest for transaction-output [2] with 1 message(s) failed: Broker: Producer attempted an operation with an old epoch (broker 2 PID{Id:1,Epoch:4}, base seq 627): transactional producer fenced by newer producer instance
%3|Wed Sep  7 09:31:50 2022.636|ERROR|rdkafka#producer-1| [thrd:10.142.65.197:9092/2]: Fatal error: Local: This instance has been fenced by a newer instance: ProduceRequest for transaction-output [2] with 1 message(s) failed: Broker: Producer attempted an operation with an old epoch (broker 2 PID{Id:1,Epoch:4}, base seq 627): transactional producer fenced by newer producer instance

%7|Wed Sep  7 09:31:50 2022.636|TXNAPI|rdkafka#producer-1| [thrd:app]: Transactional API called: commit_transaction (in txn state FatalError, idemp state FatalError)
%7|Wed Sep  7 09:31:50 2022.636|PURGE|rdkafka#producer-1| [thrd:TxnCoordinator]: TxnCoordinator/4: Purging queues with flags queue,non-blocking
%7|Wed Sep  7 09:31:50 2022.636|TXN|rdkafka#producer-1| [thrd:10.142.65.197:9092/2]: Starting PID FSM timer (fire immediately): Drain done
%7|Wed Sep  7 09:31:50 2022.636|PURGE|rdkafka#producer-1| [thrd:10.142.65.124:9092/1]: transaction-output [0]: purging queues (purge_flags 0x5, include xmit_msgq)
%7|Wed Sep  7 09:31:50 2022.636|PURGE|rdkafka#producer-1| [thrd:10.142.65.124:9092/1]: transaction-output [1]: purging queues (purge_flags 0x5, include xmit_msgq)
%7|Wed Sep  7 09:31:50 2022.636|PURGEQ|rdkafka#producer-1| [thrd:10.142.65.124:9092/1]: 10.142.65.124:9092/1: Purged 0 message(s) from 0 partition(s)
%7|Wed Sep  7 09:31:50 2022.636|PURGEQ|rdkafka#producer-1| [thrd:main]: Purged 0 message(s) from 0 UA-partition(s)
%7|Wed Sep  7 09:31:50 2022.636|PURGE|rdkafka#producer-1| [thrd:10.142.65.197:9092/2]: 10.142.65.197:9092/2: Purging queues with flags queue,non-blocking
%7|Wed Sep  7 09:31:50 2022.636|PURGEQ|rdkafka#producer-1| [thrd:10.142.65.197:9092/2]: 10.142.65.197:9092/2: Purged 0 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests
%7|Wed Sep  7 09:31:50 2022.637|PURGE|rdkafka#producer-1| [thrd:10.142.65.197:9092/2]: transaction-output [2]: purging queues (purge_flags 0x5, include xmit_msgq)
%7|Wed Sep  7 09:31:50 2022.636|PURGE|rdkafka#producer-1| [thrd:10.142.65.152:9092/4]: 10.142.65.152:9092/4: Purging queues with flags queue,non-blocking
FATAL ERROR: Failed to commit transaction: _FENCED: ProduceRequest for transaction-output [2] with 1 message(s) failed: Broker: Producer attempted an operation with an old epoch (broker 2 PID{Id:1,Epoch:4}, base seq 627): transactional producer fenced by newer producer instance
%7|Wed Sep  7 09:31:50 2022.637|PURGEQ|rdkafka#producer-1| [thrd:10.142.65.197:9092/2]: 10.142.65.197:9092/2: Purged 0 message(s) from 0 partition(s)
%7|Wed Sep  7 09:31:50 2022.636|PURGE|rdkafka#producer-1| [thrd:10.142.65.14:9092/bootstrap]: 10.142.65.14:9092/0: Purging queues with flags queue,non-blocking

At this time leadership transfer happened:

Sep  7 07:31:50 redpanda-4 rpk[290228]: TRACE 2022-09-07 09:31:50,009 [shard 0] raft - [group_id:28, {kafka/transaction-output/2}] consensus.cc:1474 - Vote request: {node_id: {id: {2}, revision: {140}}, target_node_id{id: {3}, revision: {140}}, group: {28}, term:{2}, prev_log_index:{640}, prev_log_term: {1}, leadership_xfer: 1}
Sep  7 07:31:50 redpanda-4 rpk[290228]: INFO  2022-09-07 09:31:50,009 [shard 0] raft - [group_id:28, {kafka/transaction-output/2}] consensus.cc:1566 - Received vote request with larger term from node {id: {2}, revision: {140}}, received 2, current 1
Sep  7 07:31:50 redpanda-4 rpk[290228]: TRACE 2022-09-07 09:31:50,009 [shard 1] raft - [group_id:4, {kafka/__consumer_offsets/3}] consensus.cc:1474 - Vote request: {node_id: {id: {2}, revision: {84}}, target_node_id{id: {3}, revision: {56}}, group: {4}, term:{7}, prev_log_index:{19}, prev_log_term: {6}, leadership_xfer: 1}
Sep  7 07:31:50 redpanda-4 rpk[290228]: INFO  2022-09-07 09:31:50,009 [shard 1] raft - [group_id:4, {kafka/__consumer_offsets/3}] consensus.cc:1566 - Received vote request with larger term from node {id: {2}, revision: {84}}, received 7, current 6

transactions_20220907_0921_after_enabling_rack.txt redpanda_cluster_log.txt.gz

piyushredpanda commented 2 years ago

Sounds like https://github.com/redpanda-data/redpanda/issues/6213 to me. @bharathv wdyt?

bharathv commented 2 years ago

Yes, seems like leadership changes is contributing to this (part of the problem). Some state can be lost if a leadership change happens on tx coordinator / data partitions (#6213, #6214) with in-progress txns resulting in such weird errors like unknown_server. The situation is probably exacerbated by leadership balancer that tries to balance leaders around in the cluster. We are working on some changes to improve the behavior in such cases and have better user experience.

BenPope commented 1 year ago

@bharathv where are we with this?

bharathv commented 1 year ago

think this can be closed. @freef4ll agree?

freef4ll commented 1 year ago

Yes, I think it can be closed as should be fixed by https://github.com/redpanda-data/redpanda/pull/7061 and https://github.com/redpanda-data/redpanda/pull/7695

redpanda-data / redpanda