Open popcnt1 opened 9 months ago
move build tx out of execution: https://github.com/rooch-network/rooch/pull/1390:
huge improvement(It's actually closer to the real server-side execution overhead) , the major cost is from tokio wake
RUST_LOG=error cargo bench --bench bench_tx_write
Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks)
Finished bench [optimized + debuginfo] target(s) in 12.05s
Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-624c3ac78421ba7f)
execute_tx time: [388.70 µs 389.91 µs 391.12 µs]
change: [-97.069% -97.059% -97.049%] (p = 0.00 < 0.05)
Performance has improved.
move build tx out of execution: #1390:
huge improvement(It's actually closer to the real server-side execution overhead) , the major cost is from tokio wake
RUST_LOG=error cargo bench --bench bench_tx_write Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks) Finished bench [optimized + debuginfo] target(s) in 12.05s Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-624c3ac78421ba7f) execute_tx time: [388.70 µs 389.91 µs 391.12 µs] change: [-97.069% -97.059% -97.049%] (p = 0.00 < 0.05) Performance has improved.
but I cannot capture execute fn in flamegraph, that's weird. (only validate process)
move build tx out of execution: #1390: huge improvement(It's actually closer to the real server-side execution overhead) , the major cost is from tokio wake
RUST_LOG=error cargo bench --bench bench_tx_write Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks) Finished bench [optimized + debuginfo] target(s) in 12.05s Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-624c3ac78421ba7f) execute_tx time: [388.70 µs 389.91 µs 391.12 µs] change: [-97.069% -97.059% -97.049%] (p = 0.00 < 0.05) Performance has improved.
but I cannot capture execute fn in flamegraph, that's weird. (only validate process)
that's why no execution process only validation:
2024-02-24T09:13:52.654427Z DEBUG rooch_rpc_server::service::rpc_service: Failed to validate transaction: VMError with status LINKER_ERROR at location UNDEFINED and message Cannot find ModuleId { address: 893046799b1f3d94c73e4b5b7769e193fc0acbe8c149ee058b3df069ca745226, name: Identifier("simple_blog") } in data cache
move build tx out of execution: #1390: huge improvement(It's actually closer to the real server-side execution overhead) , the major cost is from tokio wake
RUST_LOG=error cargo bench --bench bench_tx_write Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks) Finished bench [optimized + debuginfo] target(s) in 12.05s Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-624c3ac78421ba7f) execute_tx time: [388.70 µs 389.91 µs 391.12 µs] change: [-97.069% -97.059% -97.049%] (p = 0.00 < 0.05) Performance has improved.
but I cannot capture execute fn in flamegraph, that's weird. (only validate process)
fixed link error https://github.com/rooch-network/rooch/pull/1392
move build tx out of execution: #1390: huge improvement(It's actually closer to the real server-side execution overhead) , the major cost is from tokio wake
RUST_LOG=error cargo bench --bench bench_tx_write Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks) Finished bench [optimized + debuginfo] target(s) in 12.05s Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-624c3ac78421ba7f) execute_tx time: [388.70 µs 389.91 µs 391.12 µs] change: [-97.069% -97.059% -97.049%] (p = 0.00 < 0.05) Performance has improved.
but I cannot capture execute fn in flamegraph, that's weird. (only validate process)
that's why no execution process only validation:
2024-02-24T09:13:52.654427Z DEBUG rooch_rpc_server::service::rpc_service: Failed to validate transaction: VMError with status LINKER_ERROR at location UNDEFINED and message Cannot find ModuleId { address: 893046799b1f3d94c73e4b5b7769e193fc0acbe8c149ee058b3df069ca745226, name: Identifier("simple_blog") } in data cache
fixed in: https://github.com/rooch-network/rooch/pull/1392
RUST_LOG=error cargo bench --bench bench_tx_write
Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks)
Finished bench [optimized + debuginfo] target(s) in 13.62s
Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-624c3ac78421ba7f)
INCLUDING DEPENDENCY MoveStdlib
INCLUDING DEPENDENCY MoveosStdlib
INCLUDING DEPENDENCY RoochFramework
BUILDING simple_blog
execute_tx time: [5.5210 ms 6.0733 ms 6.3923 ms]
change: [-12.752% -2.5564% +8.6448%] (p = 0.66 > 0.05)
No change in performance detected.
with mem fs:
RUST_LOG=error PUREMEM=/Volumes/RAMDisk/rtmp cargo bench --bench bench_tx_write
Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks)
warning: unused import: `std::sync::Arc`
--> crates/rooch-benchmarks/src/tx.rs:44:5
|
44 | use std::sync::Arc;
| ^^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
warning: unused import: `tempfile::TempDir`
--> crates/rooch-benchmarks/src/tx.rs:46:5
|
46 | use tempfile::TempDir;
| ^^^^^^^^^^^^^^^^^
warning: `rooch-benchmarks` (lib) generated 2 warnings (run `cargo fix --lib -p rooch-benchmarks` to apply 2 suggestions)
Finished bench [optimized + debuginfo] target(s) in 13.26s
Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-9809035715c4c70d)
INCLUDING DEPENDENCY MoveStdlib
INCLUDING DEPENDENCY MoveosStdlib
INCLUDING DEPENDENCY RoochFramework
BUILDING simple_blog
execute_tx time: [5.1027 ms 5.6767 ms 5.9972 ms]
change: [-13.073% -0.8050% +12.977%] (p = 0.91 > 0.05)
No change in performance detected.
run bench with tx_data_size = 0, no significant improvement
RUST_LOG=error TX_SIZE=0 PUREMEM=/Volumes/RAMDisk/rtmp cargo bench --bench bench_tx_write
Finished bench [optimized + debuginfo] target(s) in 1m 37s
Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-9809035715c4c70d)
INCLUDING DEPENDENCY MoveStdlib
INCLUDING DEPENDENCY MoveosStdlib
INCLUDING DEPENDENCY RoochFramework
BUILDING simple_blog
execute_tx time: [5.0727 ms 5.6405 ms 5.9602 ms]
change: [-12.465% -0.5425% +11.901%] (p = 0.93 > 0.05)
No change in performance detected.
256 bytes tx data size with sync option for rocksdb:
RUST_LOG=error TX_SIZE=256 cargo bench --bench bench_tx_write
execute_tx time: [5.5854 ms 6.1576 ms 6.5102 ms]
change: [-35.667% -15.056% +14.721%] (p = 0.38 > 0.05)
No change in performance detected.
256 bytes tx data size with no indexer, sync for rocksdb::
RUST_LOG=error TX_SIZE=256 cargo bench --bench bench_tx_write
execute_tx time: [2.6807 ms 3.0280 ms 3.2322 ms]
change: [-5.5588% +9.1633% +27.002%] (p = 0.29 > 0.05)
No change in performance detected.
256 bytes tx data size with no indexer, no sync for rocksdb:
RUST_LOG=error TX_SIZE=256 cargo bench --bench bench_tx_write
execute_tx time: [2.4587 ms 2.8583 ms 3.0955 ms]
change: [+11.088% +47.174% +123.71%] (p = 0.02 < 0.05)
Performance has regressed.
without indexer updates
transfer tx:
RUST_LOG=error TX_TYPE=transfer cargo bench --bench bench_tx_write
execute_tx time: [3.0738 ms 3.1135 ms 3.1530 ms]
change: [+0.6383% +8.0757% +19.114%] (p = 0.14 > 0.05)
No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) high mild
1 (10.00%) high severe
transfer tx with mem filesystem:
RUST_LOG=error DATA_DIR=/volumes/RAMDisk TX_TYPE=transfer cargo bench --bench bench_tx_write
execute_tx time: [2.8923 ms 2.9551 ms 2.9870 ms]
change: [-8.7169% -6.1064% -3.4318%] (p = 0.00 < 0.05)
Performance has improved.
empty tx:
RUST_LOG=error TX_TYPE=empty cargo bench --bench bench_tx_write
Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks)
Finished bench [optimized + debuginfo] target(s) in 13.77s
Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-b8caaaf38acbd4cd)
execute_tx time: [1.6870 ms 1.7017 ms 1.7141 ms]
change: [+7.7499% +8.5624% +9.4147%] (p = 0.00 < 0.05)
Performance has regressed.
empty tx with mem filesystem:
RUST_LOG=error DATA_DIR=/volumes/RAMDisk TX_TYPE=empty cargo bench --bench bench_tx_write
Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks)
Finished bench [optimized + debuginfo] target(s) in 15.92s
Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-b8caaaf38acbd4cd)
execute_tx time: [1.5552 ms 1.5570 ms 1.5580 ms]
change: [-1.5589% +0.0792% +2.9995%] (p = 0.95 > 0.05)
No change in performance detected.
Found 3 outliers among 10 measurements (30.00%)
2 (20.00%) low severe
1 (10.00%) high severe
for no_sync rocksdb (put & write_batch), I log rocksdb operation's cost and using script to output this:
./get_time.sh
Total put operations: 8016
Total time elapsed in put operations (in ns): 49929940
Average time per put operation (in ns): 6228.78
Total bytes in put operations: 1431756
Average bytes per put operation: 178.612
Average number of put operations per execute_tx: 5.00062
Average time of put operations per execute_tx (in ns): 31147.8
---------
Total get operations: 128334
Total time elapsed in get operations (in ns): 103305590
Average time per get operation (in ns): 804.974
Total bytes in get operations: 43787172
Average bytes per get operation: 341.197
Average number of get operations per execute_tx: 80.0586
Average time of get operations per execute_tx (in ns): 64445.2
---------
Total write_batch operations: 4834
Total time elapsed in write_batch operations (in ns): 3.53812e+07
Average time per write_batch operation (in ns): 7319.23
Total bytes in write_batch operations: 865968
Average bytes per write_batch operation: 179.141
Average number of write_batch operations per execute_tx: 3.0156
Average time of write_batch operations per execute_tx (in ns): 22071.8
---------
for no_sync rocksdb (put & write_batch), I log rocksdb operation's cost and using script to output this (187us/tx):
./get_time.sh
Total put operations: 8016
Total time elapsed in put operations (in µs): 44234.4
Average time per put operation (in µs): 5.51827
Total bytes in put operations: 1431756
Average bytes per put operation: 178.612
Average number of put operations per execute_tx: 5.00062
Average time of put operations per execute_tx (in µs): 27.5948
---------
Total get operations: 128334
Total time elapsed in get operations (in µs): 103229
Average time per get operation (in µs): 0.80438
Total bytes in get operations: 43787172
Average bytes per get operation: 341.197
Average number of get operations per execute_tx: 80.0586
Average time of get operations per execute_tx (in µs): 64.3976
---------
Total write_batch operations: 4834
Total time elapsed in write_batch operations (in µs): 39028
Average time per write_batch operation (in µs): 8.07365
Total bytes in write_batch operations: 865968
Average bytes per write_batch operation: 179.141
Average number of write_batch operations per execute_tx: 3.0156
Average time of write_batch operations per execute_tx (in µs): 24.3469
---------
Total put_sync operations: 3209
Total time elapsed in put_sync operations (in µs): 116106
Average time per put_sync operation (in µs): 36.1815
Total bytes in put_sync operations: 167014
Average bytes per put_sync operation: 52.0455
Average number of put_sync operations per execute_tx: 2.00187
Average time of put_sync operations per execute_tx (in µs): 72.4307
---------
RUST_LOG=error TX_TYPE=empty cargo bench --bench bench_tx_write
Finished bench [optimized] target(s) in 0.73s
Running benches/bench_tx_write.rs (/Users/templex/rooch/subodhi/rooch/target/release/deps/bench_tx_write-9a00dff23725eca7)
execute_tx time: [1.9076 ms 1.9201 ms 1.9316 ms]
change: [-2.3943% -0.6263% +0.8086%] (p = 0.53 > 0.05)
No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
RUST_LOG=error TX_TYPE=transfer cargo bench --bench bench_tx_write
Finished bench [optimized] target(s) in 0.39s
Running benches/bench_tx_write.rs (/Users/templex/rooch/subodhi/rooch/target/release/deps/bench_tx_write-9a00dff23725eca7)
execute_tx time: [3.0622 ms 3.1346 ms 3.1687 ms]
change: [-5.9144% -2.1965% +1.5103%] (p = 0.30 > 0.05)
No change in performance detected.
after change default write options: set_sync -> false
RUST_LOG=error TX_TYPE=transfer cargo bench --bench bench_tx_write
execute_tx time: [2.8585 ms 2.9698 ms 3.0495 ms]
change: [-7.9807% -1.4107% +4.2084%] (p = 0.72 > 0.05)
No change in performance detected.
rocksdb v0.21.0 -> v0.22.0 and rm multithread cf:
Finished bench [optimized] target(s) in 0.41s
Running benches/bench_tx_write.rs (/Users/templex/rooch/subodhi/rooch/target/release/deps/bench_tx_write-892d2f3518007bf1)
execute_tx time: [2.8199 ms 2.8966 ms 2.9364 ms]
change: [-4.1880% -1.1769% +2.1003%] (p = 0.50 > 0.05)
No change in performance detected.
on platform2:
RUST_LOG=error TX_TYPE=transfer cargo bench --bench bench_tx_write
execute_tx time: [2.8009 ms 2.8532 ms 2.8790 ms]
change: [-3.1362% -0.7240% +1.6901%] (p = 0.58 > 0.05)
No change in performance detected.
buf write nodes when apply change_change_set(https://github.com/popcnt-subodhi/rooch/commit/660cd43f20d7583a05d77b178207f62d4eb0734f):
on platform2:
RUST_LOG=error TX_TYPE=transfer cargo bench --bench bench_tx_write
Finished bench [optimized] target(s) in 0.30s
Running benches/bench_tx_write.rs (/home/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-f73555ca8f703ae2)
execute_tx time: [2.7433 ms 2.7940 ms 2.8186 ms]
change: [-2.9591% -0.2739% +2.1993%] (p = 0.85 > 0.05)
No change in performance detected.
execute l1 block tx for btc block after 800,000(height):
data import mode | avg cost |
---|---|
none | 4.4906s |
ord | 5.7325s |
utxo | 4.4913s |
for btc block after 200,000(height) is 132.46ms:
building profile effect:
tx_exec_blk time: [4.0600 s 4.4255 s 4.8032 s]
change: [-13.149% -1.5229% +10.922%] (p = 0.81 > 0.05)
No change in performance detected.
tx_exec_blk time: [3.5628 s 3.9052 s 4.2501 s]
change: [-22.557% -11.757% -0.5718%] (p = 0.07 > 0.05)
No change in performance detected.
tx_exec_blk time: [3.7183 s 4.0718 s 4.4269 s]
change: [-7.5759% +4.2659% +18.680%] (p = 0.54 > 0.05)
No change in performance detected.
bench | avg cost |
---|---|
exec empty | 1.0313ms |
exec transfer | 1.5611ms |
exec btc block (>= 800,000 height) | 2.384s |
platform1:
MacBook Apple M2 24GB 512GB
disk:
mem filesystem
original result (updated: https://github.com/rooch-network/rooch/pull/1392)
platform 2
Fedora39
12700k with optane905:
Platform 3
Google Cloud with 2TB persistent SSD
Platform 4
alicloud
Platform 5
Fedora39
Intel12700K + Solidigm D5-P5430