hyperledger-iroha / iroha

Iroha - A simple, enterprise-grade decentralized ledger
https://wiki.hyperledger.org/display/iroha
Apache License 2.0
438 stars 280 forks source link

[BUG] [perf] Stability testing, out of memory exeption #5083

Open timofeevmd opened 1 month ago

timofeevmd commented 1 month ago

OS and Environment

linux, k8s

GIT commit hash

995da4ec

Minimum working example / Steps to reproduce

java-sdk commit: 9aba55183e

reproduce command: ./mvnw gatling:test -DtargetURL=https://iroha2.test.tachi.soramitsu.co.jp -DremoteLogin= -DremotePassword= -Dintensity=800000 -DrampDuration=3600 -DmaxDuration=3601

full performance report

container logs: OOM.log.zip

Under a load of 1100 TPS, the system is able to function for only 44 minutes whitout any arrors and rejected transaction before encountering issues. (graph 1). Following this period, a memory leak. (graph 2) .

graph 1

image-20240917-103749

graph 2

image-20240917-104103

Actual result

the product works stably

Expected result

after 45 minutes working catch "out if memory" exception

Logs

Log contents

Who can help to reproduce?

@timofeevmd

Notes

No response

Mingela commented 1 month ago

@timofeevmd What kind of transactions are being sent? In case there are new entities being registered (like asset definitions, accounts, domains) those consume memory naturally so the leak have to be justified.

Mingela commented 1 month ago

UPD; found out that's an asset transfer, though transactions (and blocks) itself consume memory so it's up to the team to analyze the consumption and conclude whether it's reasonable, thanks

dima74 commented 1 month ago

Based on my testing, size of a transaction with a single transfer asset instruction is ~1.8KB (CommittedTransaction struct). So for 1 million transactions iroha will consume ~1.8GB, for 3 million transactions ~5.6GB. This matches results in this issue.

I see two options how memory consumption can be improved:

dima74 commented 1 month ago

Analysis of CommittedTransaction struct memory layout.

Key results:

Possible optimizations:

// self: 336
// heap: 1100 + instructions heap
struct CommittedTransaction {
    // self: 136
    // heap: 1100 + instructions heap
    value: SignedTransaction,
    // self: 200
    error: Option<error::TransactionRejectionReason>,
}

// self: 136
// heap: 1100 + instructions heap
struct SignedTransactionV1 {
    // self: 16
    // heap: 64
    signature: TransactionSignature,
    // self: 120
    // heap: 1036 + instructions heap
    payload: TransactionPayload,
}

// self: 120
// heap: 1036 + instructions heap
struct TransactionPayload {
    // self: 16, heap: 36 for default chain id
    chain: ChainId,
    // self: 24, heap: ~296
    authority: AccountId,
    // self: 8
    creation_time_ms: u64,
    // self: 32,
    // heap: 704 + instructions heap
    instructions: Executable,
    // self: 8
    time_to_live_ms: Option<NonZeroU64>,
    // self: 4
    nonce: Option<NonZeroU32>,
    // self: 24, heap: 0 if empty
    metadata: Metadata,
}

// self: 32
// heap: 176 * (instructions capacity) + instructions heap
//
// for a transaction with a single transfer instruction (4 is min Vec capacity):
// heap: 176 * 4 + ~592
enum Executable {
    Instructions(Vec<InstructionBox>),
    Wasm(WasmSmartContract),
}

// self: 176
enum InstructionBox {
    ...
}

// self: 96
// heap: ~592
struct Transfer<Asset, Numeric, Account> {
    // self: 56, heap: ~296
    source: AssetId,
    // self: 16
    object: Numeric,
    // self: 24, heap: ~296
    destination: AccountId,
}

// self: 56, heap: ~296
struct AssetId {
    // self: 24, heap: ~296
    account: AccountId,
    // self: 32, heap: 0 for domain and asset name <16
    definition: AssetDefinitionId,
}

// self: 24, heap: ~296 heap
struct AccountId {
    // self: 16, heap: 0 for domains with len <16
    domain: DomainId,
    // self: 8, heap: 296
    signatory: PublicKey,
}

// 296
pub enum PublicKeyInner {
    // 192
    Ed25519(ed25519::PublicKey),
    // 104
    Secp256k1(secp256k1::PublicKey),
    // 144
    BlsNormal(bls::BlsNormalPublicKey),
    // 288
    BlsSmall(bls::BlsSmallPublicKey),
}
DCNick3 commented 1 month ago

Use read-only Box<[InstructionBox]> instead of Vec in Executable? Minimal capacity of a Vec is 4 elements, so for a transaction with a single instruction we have 176*3=528 unused space for a buffer

Huh, I didn't know of such behavior. Notably, when doing vec![123], it allocates the capacity exactly, but with let mut v = Vec::new(); v.push(123); it does indeed overallocate the capacity to 4 to amortize future pushes.

(We also have a ConstVec<T> newtype around Box<[T]> to do the right thing with the scale encoding)

dima74 commented 1 month ago

Actually, we still have capacity=1 since we clone transactions in Sumeragi::try_create_block (this is a single peer case, not sure about multiple peers). Anyway would be nice to replace it with Box<[T]> to have explicit behaviour. Opened #5092

fn main() {
    let mut v = Vec::new();
    v.push(1);

    println!("{}", v.capacity());  // 4

    let v2 = v.clone();
    println!("{}", v2.capacity());  // 1
}
dima74 commented 4 weeks ago

With #5096 and #5103 merged, memory usage expected to be the following:

So after 1mln transactions iroha should consume ~450MB, after 10mln transactions ~2.7GB.


State::transactions is map from HashOf<SignedTransaction> to NonZeroUsize, so entry size is 40 bytes (32+8). But this is BTreeMap, so some keys are stored multiple times in intermediate nodes. Also nodes store some data related to concread crate. This gives total ~260 bytes per transaction. Not sure how this can be optimized further.