mit-dci / opencbdc-tx

A transaction processor for a hypothetical, general-purpose, central bank digital currency
Other
896 stars 198 forks source link

Analyze Atomizer's performance bottleneck #75

Open wadagso-gertjaap opened 2 years ago

wadagso-gertjaap commented 2 years ago

Question

In the Phase 1 paper of Project Hamilton we concluded that "The atomizer itself is the limiting factor for the overall system as all transaction notifications have to be routed via the atomizer Raft cluster.". Where lies the exact bottleneck of the Atomizer, and can it be alleviated?

Benefit

By knowing where the exact bottleneck of the Atomizer lies, we can look at potential solutions to make the Atomizer perform better, or make informed recommendations about the infrastructure on which the Atomizer can perform in the most optimal way. For instance: if the problem lies in bandwidth between Atomizer RAFT cluster participants, we can recommend having dedicated, high bandwidth low latency network connections between them. If the problem lies in the RAFT log storage, we can recommend using solid state storage. And so on.

Proposed Solution

We made a local benchmarking setup that's isolated in a separate branch. The documentation in that branch includes how to run the benchmark here.

Our initial approach was to start with just logging the event of receiving the transaction, and not further process it. This lead to a peak of about 350k TX/s. We then began moving this point of discarding the transaction further in the atomizer's process. The transaction processing follows the following code paths:

  1. Transaction is received by cbdc::atomizer::controller::server_handler
  2. Transaction is received by cbdc::atomizer::atomizer_raft::tx_notify
    • This method is called by all individual shards, with attestations for their respective input
    • This method will place the transaction in a mutex guarded vector of complete transactions m_complete_txs here
  3. Transactions are compiled into batches by cbdc::atomizer::atomizer_raft::send_complete_txs
    • This method will make an aggregate_tx_notify_request containing multiple tx_notify structs and send them to the raft state machine
  4. Batches of tx_notify are then received and processed by cbdc::atomizer::state_machine::commit
  5. For each transaction, cbdc::atomizer::atomizer::insert_complete is called to verify the attestation being within the STXO Cache Depth (thus not expired), and none of the inputs are in the STXO cache (spent)
    • If all these checks succeed, the transaction is added to the STXO cache, and to a vector of transactions that should be included in the next block, m_complete_txs, here.
  6. The transaction is then included in the block by the swapping of the m_complete_txs vector that happens in cbdc::atomizer::atomizer::make_block here
  7. The block is then returned to the cbdc::atomizer::state_machine::commit function here
  8. The block is then returned to the raft callback in cbdc::atomizer::controller::raft_result_handler here where it's distributed to the watchtowers and shards

We have been moving the point where the transaction gets discarded further down this code path. The point where we are currently, is where the block and error data are returned from the RAFT state machine (which is after point 7. What we return from cbdc::atomizer::state_machine::commit is passed to the callback function of bullet point 8).

From the baseline test, commit 9fa80fb, we disabled block transactions and errors being returned from the raft state machine commit 01e578c.

This elevates the peak throughput of the Atomizer from 170k TX/s to 250k TX/s. But errors are no longer reported to the watchtowers and blocks only contain the height.

The assumption after this analysis is that there is a bottleneck in either the RAFT serialization (here) / storage (here) or in the callback functions processing the return value (here) and broadcasting them to the shards and watchtowers.

Further analysis is needed to pinpoint the exact problem and then we can work on a solution that resolves this.

Possible next steps

Things that can be tried to further pinpoint the issue:

Possible Difficulties

No response

Prior Work

No response

Code of Conduct

wadagso-gertjaap commented 2 years ago

I started a few of the possible next steps in this branch - specifically switched to an in-memory RAFT store and returning only block heights from the RAFT state machine, fetching them from the controller and moved publishing them to a separate thread.

The difference between these two commits is astonishing. The only difference is that one doesn't actually broadcast the blocks over the network layer, while the other does m_atomizer_network.broadcast(...).

https://github.com/mit-dci/opencbdc-tx/commit/85a1bf87c613a4d36da914d9d2daeda079e415ec (350k TX/s peak) https://github.com/mit-dci/opencbdc-tx/commit/55a49ac4ba55afd7438ec9c4ace5cfb6e4fa9dd1 (200k TX/s peak)

image image

This seems to indicate there's some bottlenecking in the network stack. I don't understand why enabling this broadcasting should slow the whole system down. Especially since it's running on a separate thread.