tendermint / tendermint

⟁ Tendermint Core (BFT Consensus) in Go
https://tendermint.com/
Apache License 2.0
5.71k stars 2.07k forks source link

Broadcasting 200~500kb tx on v0.35.6 vs 0.34.20 #9220

Closed Bidon15 closed 1 year ago

Bidon15 commented 2 years ago

Introduction

At Celestia we are relying on the ability to generate and broadcast large transactions, hence using testground, we emulated the scenario when a set of validators in the network can generate and broadcast transaction up to 500kb each that will be included in the next block (e.g. 3 validators generating and broadcasting 500kb each results in a ~1.5mb next block size)

We would like to demonstrate below how the same test scenario and environment produced different outcome for tendermint v0.35.6(*) and latest downgraded v0.34.20

(*) - We have added two ABCI++ methods for our needs and this change https://github.com/celestiaorg/celestia-core/pull/793

Environment

Env Number Tendermint version Cosmos sdk version
1 v0.35.6 v0.46.0-beta2
2 v0.34.20 v0.46.0

Testground Network Configuration

Bandwidth: 100 and 256Mib

Latency: 0ms

Config.toml for each of the validator

Mempool

max_txs_bytes 1073741824
max_tx_bytes 1048576
size 5000

Consensus

timeout_propose 3 sec
timeout_prevote 1 sec
timeout_precommit 1 sec
timeout_commit 30 sec

RPC

timeout_broadcast_tx_commit 40 sec
max_body_bytes 1000000
max_header_bytes 1048576

Notes:

  1. in v0.35.x the underscored dash is replaced with a normal dash. Still I’ve left the v0.34.x style in the tables
  2. in v0.35.x new mode config has been set to “validator”

Test Scenario

Pre-Requisites:

  1. Cobra commands are used for interacting with each of the validators(**)
  2. Each of the validators start from genesis block
  3. Connection is established between each of them
  4. 1 block is produced

(**) - this means that we are communicating with the BL of the node as if we are node operators and using CLI commands

Steps:

  1. Each validator generates random 500kb tx
  2. Each validator executes tx.GenerateOrBroadcastTxCLI with 500kb data and:
    1. flag -b block
  3. Waits for 5 minutes the tx to be included in the next block

Expected Results:

Actual Results

v0.35.x

Number of Validators TX size Next block size
3 500kb ~1.5Mb
20 200kb ~2Mb

The whole chain get stuck in rounds for 5 minutes without either:

v0.34.x

Number of Validators TX size Next block size
3 500kb ~1.5Mb
20 200kb ~2Mb

After successful downgrade to latest version of v0.34.x, we observed:

More Info:

Logs from testground and each of the validators can be found in this issue: https://github.com/celestiaorg/celestia-app/issues/563

In addition, we are continuing the investigation on our side to have an understanding if the root cause might be on our fork and provide more data: https://github.com/celestiaorg/celestia-core/issues/814

thanethomson commented 2 years ago

Given #9155, is this issue still relevant?

sergio-mena commented 1 year ago

@Bidon15 Further to @thanethomson's comment, is the bad behaviour seen in v0.37.x (or in a later release of v0.34.x)? Note that all releases in the v0.35.x branch have been retracted as @thanethomson is pointing out. If this bad behaviour is only seen in v0.35.x we will close this issue. Feel free to re-open in the future if you see it happening in v0.34.x, v0.37.x or any future release.

Bidon15 commented 1 year ago

Hey @sergio-mena

On v0.34.x the issue is not reproducible. Can't say for 0.37.x

thanethomson commented 1 year ago

Cool, thanks @Bidon15. We'll close this issue for now then. If you see this happen again, please feel free to reopen it.