rooch-network / rooch

VApp Container with Move Language
https://rooch.network
Apache License 2.0
128 stars 54 forks source link

[perf]: Performance tuning log #1383

Open popcnt1 opened 2 months ago

popcnt1 commented 2 months ago

platform1:

MacBook Apple M2 24GB 512GB

disk:

fio --name=random_write_latency_test --filename=rand_write_test --filesize=10G --time_based=1 --ramp_time=30s --runtime=1m --ioengine=posixaio --direct=1 --verify=0 --randrepeat=0 --bs=4k --iodepth=1 --rw=randwrite -norandommap=1 --lat_percentiles=1 --thread=1 --numjobs=1--group_reporting
random_write_latency_test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.36
Starting 1 thread
random_write_latency_test: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [w(1)][100.0%][w=26.5MiB/s][w=6796 IOPS][eta 00m:00s]
random_write_latency_test: (groupid=0, jobs=1): err= 0: pid=3075: Mon Feb 26 10:15:55 2024
  write: IOPS=6985, BW=27.3MiB/s (28.6MB/s)(1637MiB/60001msec); 0 zone resets
    slat (nsec): min=0, max=67000, avg=700.14, stdev=872.37
    clat (usec): min=7, max=7074, avg=142.25, stdev=224.22
     lat (usec): min=7, max=7074, avg=142.95, stdev=224.36
    clat percentiles (usec):
     |  1.00th=[   87],  5.00th=[   91], 10.00th=[   93], 20.00th=[   95],
     | 30.00th=[   97], 40.00th=[   99], 50.00th=[  103], 60.00th=[  109],
     | 70.00th=[  122], 80.00th=[  127], 90.00th=[  135], 95.00th=[  145],
     | 99.00th=[ 1647], 99.50th=[ 1680], 99.90th=[ 1745], 99.95th=[ 1778],
     | 99.99th=[ 1876]
    lat percentiles (usec):
     |  1.00th=[   88],  5.00th=[   92], 10.00th=[   94], 20.00th=[   96],
     | 30.00th=[   98], 40.00th=[  100], 50.00th=[  103], 60.00th=[  110],
     | 70.00th=[  123], 80.00th=[  128], 90.00th=[  135], 95.00th=[  147],
     | 99.00th=[ 1647], 99.50th=[ 1696], 99.90th=[ 1745], 99.95th=[ 1778],
     | 99.99th=[ 1876]
   bw (  KiB/s): min=25203, max=29458, per=100.00%, avg=27959.66, stdev=524.94, samples=119
   iops        : min= 6300, max= 7364, avg=6989.58, stdev=131.24, samples=119
  lat (usec)   : 10=0.01%, 20=0.01%, 50=0.01%, 100=40.50%, 250=57.03%

mem filesystem

fio --name=random_write_latency_test --filename=rand_write_test --filesize=200M --time_based=1 --ramp_time=30s --runtime=1m --ioengine=posixaio --direct=1 --verify=0 --randrepeat=0 --bs=4k --iodepth=1 --rw=randwrite -norandommap=1 --lat_percentiles=1 --thread=1 --numjobs=1--group_reporting
random_write_latency_test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.36
Starting 1 thread
random_write_latency_test: Laying out IO file (1 file / 200MiB)
Jobs: 1 (f=1): [w(1)][100.0%][w=176MiB/s][w=45.2k IOPS][eta 00m:00s]
random_write_latency_test: (groupid=0, jobs=1): err= 0: pid=2819: Tue Feb 27 00:18:06 2024
  write: IOPS=45.2k, BW=177MiB/s (185MB/s)(10.3GiB/60001msec); 0 zone resets
    slat (nsec): min=0, max=911000, avg=2641.40, stdev=2013.63
    clat (nsec): min=0, max=1019.0k, avg=19213.98, stdev=4345.28
     lat (usec): min=3, max=1022, avg=21.86, stdev= 4.82
    clat percentiles (nsec):
     |  1.00th=[ 5024],  5.00th=[11968], 10.00th=[15040], 20.00th=[16064],
     | 30.00th=[17024], 40.00th=[18048], 50.00th=[19072], 60.00th=[20096],
     | 70.00th=[21120], 80.00th=[22912], 90.00th=[23936], 95.00th=[24960],
     | 99.00th=[28032], 99.50th=[30080], 99.90th=[34048], 99.95th=[38144],
     | 99.99th=[48896]
    lat percentiles (nsec):
     |  1.00th=[ 7008],  5.00th=[16064], 10.00th=[17024], 20.00th=[18048],
     | 30.00th=[19072], 40.00th=[20096], 50.00th=[21888], 60.00th=[22912],
     | 70.00th=[24960], 80.00th=[27008], 90.00th=[28032], 95.00th=[28032],
     | 99.00th=[32128], 99.50th=[33024], 99.90th=[38144], 99.95th=[41216],
     | 99.99th=[51968]
   bw (  KiB/s): min=179027, max=184368, per=100.00%, avg=180911.83, stdev=992.36, samples=119
   iops        : min=44756, max=46092, avg=45227.61, stdev=248.16, samples=119
  lat (nsec)   : 2=0.06%
  lat (usec)   : 2=0.20%, 4=0.06%, 10=2.70%, 20=49.35%, 50=47.62%
  lat (usec)   : 100=0.01%, 250=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%
  cpu          : usr=6.38%, sys=13.64%, ctx=4108648, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2712161,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=177MiB/s (185MB/s), 177MiB/s-177MiB/s (185MB/s-185MB/s), io=10.3GiB (11.1GB), run=60001-60001msec

original result (updated: https://github.com/rooch-network/rooch/pull/1392)

BUILDING simple_blog
execute_tx              time:   [5.5210 ms 6.0733 ms 6.3923 ms]
                        change: [-12.752% -2.5564% +8.6448%] (p = 0.66 > 0.05)
                        No change in performance detected.

platform 2

Fedora39

12700k with optane905:

fio --name=random_write_latency_test --filename=rand_write_test --filesize=10G --time_based=1 --ramp_time=30s --runtime=1m --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 --bs=4k --iodepth=1 --rw=randwrite -norandommap=1 --lat_percentiles=1 --thread=1 --numjobs=1--group_reporting
random_write_latency_test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.35
Starting 1 thread
random_write_latency_test: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [w(1)][100.0%][w=368MiB/s][w=94.3k IOPS][eta 00m:00s]
random_write_latency_test: (groupid=0, jobs=1): err= 0: pid=24159: Sun Apr 21 02:02:27 2024
  write: IOPS=88.7k, BW=346MiB/s (363MB/s)(20.3GiB/60001msec); 0 zone resets
    slat (nsec): min=809, max=228619, avg=1015.68, stdev=278.03
    clat (nsec): min=235, max=4003.5k, avg=10082.54, stdev=14267.28
     lat (usec): min=9, max=4005, avg=11.10, stdev=14.28
    clat percentiles (usec):
     |  1.00th=[    9],  5.00th=[    9], 10.00th=[    9], 20.00th=[   10],
     | 30.00th=[   10], 40.00th=[   10], 50.00th=[   10], 60.00th=[   10],
     | 70.00th=[   10], 80.00th=[   10], 90.00th=[   12], 95.00th=[   13],
     | 99.00th=[   15], 99.50th=[   35], 99.90th=[  129], 99.95th=[  310],
     | 99.99th=[  343]
    lat percentiles (usec):
     |  1.00th=[   10],  5.00th=[   10], 10.00th=[   10], 20.00th=[   10],
     | 30.00th=[   11], 40.00th=[   11], 50.00th=[   11], 60.00th=[   11],
     | 70.00th=[   11], 80.00th=[   11], 90.00th=[   13], 95.00th=[   14],
     | 99.00th=[   16], 99.50th=[   36], 99.90th=[  133], 99.95th=[  314],
     | 99.99th=[  343]
   bw (  KiB/s): min=302248, max=377848, per=100.00%, avg=354862.81, stdev=19468.80, samples=120
   iops        : min=75564, max=94462, avg=88715.71, stdev=4867.16, samples=120
  lat (nsec)   : 250=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 10=80.69%, 20=18.63%, 50=0.56%
  lat (usec)   : 100=0.02%, 250=0.01%, 500=0.09%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=3.91%, sys=13.64%, ctx=5322086, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,5322032,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=346MiB/s (363MB/s), 346MiB/s-346MiB/s (363MB/s-363MB/s), io=20.3GiB (21.8GB), run=60001-60001msec

Disk stats (read/write):
  nvme0n1: ios=79/7737330, merge=0/438, ticks=2/87801, in_queue=87802, util=99.76%

Platform 3

Google Cloud with 2TB persistent SSD

sudo fio --name=random_write_latency_test --filename=write_test --filesize=2G --time_based=1 --ramp_time=3s --runtime=20s --ioengine=posixaio --direct=1 --verify=0 --randrepeat=0 --bs=4k --iodepth=1 --rw=randwrite -norandommap=1 --lat_percentiles=1 --thread=1 --numjobs=1--group_reporting
random_write_latency_test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.33
Starting 1 thread
Jobs: 1 (f=1): [w(1)][100.0%][w=5868KiB/s][w=1467 IOPS][eta 00m:00s]
random_write_latency_test: (groupid=0, jobs=1): err= 0: pid=26437: Thu Apr 11 12:25:19 2024
  write: IOPS=1430, BW=5722KiB/s (5859kB/s)(112MiB/20001msec); 0 zone resets
    slat (nsec): min=1021, max=37776, avg=4154.48, stdev=1152.85
    clat (usec): min=219, max=113935, avg=693.93, stdev=1277.30
     lat (usec): min=223, max=113939, avg=698.08, stdev=1277.28
    clat percentiles (usec):
     |  1.00th=[  251],  5.00th=[  277], 10.00th=[  297], 20.00th=[  338],
     | 30.00th=[  494], 40.00th=[  586], 50.00th=[  635], 60.00th=[  693],
     | 70.00th=[  791], 80.00th=[  930], 90.00th=[ 1090], 95.00th=[ 1287],
     | 99.00th=[ 1582], 99.50th=[ 1696], 99.90th=[ 2147], 99.95th=[ 6259],
     | 99.99th=[91751]
    lat percentiles (usec):
     |  1.00th=[  255],  5.00th=[  281], 10.00th=[  302], 20.00th=[  343],
     | 30.00th=[  498], 40.00th=[  594], 50.00th=[  644], 60.00th=[  701],
     | 70.00th=[  799], 80.00th=[  938], 90.00th=[ 1090], 95.00th=[ 1287],
     | 99.00th=[ 1582], 99.50th=[ 1696], 99.90th=[ 2147], 99.95th=[ 6259],
     | 99.99th=[91751]
   bw (  KiB/s): min= 4168, max= 7576, per=100.00%, avg=5723.72, stdev=469.48, samples=40
   iops        : min= 1042, max= 1894, avg=1430.80, stdev=117.37, samples=40
  lat (usec)   : 250=0.85%, 500=29.52%, 750=35.81%, 1000=18.83%
  lat (msec)   : 2=14.89%, 4=0.04%, 10=0.01%, 20=0.02%, 50=0.02%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=0.43%, sys=1.01%, ctx=28699, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,28609,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=5722KiB/s (5859kB/s), 5722KiB/s-5722KiB/s (5859kB/s-5859kB/s), io=112MiB (117MB), run=20001-20001msec

Disk stats (read/write):
  sdb: ios=58074/34227, merge=0/1781, ticks=38676/71788, in_queue=110465, util=100.00%

Platform 4

alicloud

sudo fio --name=random_write_latency_test --filename=write_test --filesize=2G --time_based=1 --ramp_time=3s --runtime=20s --ioengine=posixaio --direct=1 --verify=0 --randrepeat=0 --bs=4k --iodepth=1 --rw=randwrite -norandommap=1 --lat_percentiles=1 --thread=1 --numjobs=1--group_reporting
random_write_latency_test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.28
Starting 1 thread
random_write_latency_test: Laying out IO file (1 file / 2048MiB)
Jobs: 1 (f=1): [w(1)][100.0%][w=11.0MiB/s][w=2811 IOPS][eta 00m:00s]
random_write_latency_test: (groupid=0, jobs=1): err= 0: pid=8418: Fri Apr 12 17:08:53 2024
  write: IOPS=2814, BW=11.0MiB/s (11.5MB/s)(220MiB/20001msec); 0 zone resets
    slat (nsec): min=450, max=13775, avg=924.47, stdev=321.14
    clat (usec): min=268, max=2899, avg=353.81, stdev=41.99
     lat (usec): min=269, max=2902, avg=354.74, stdev=42.00
    clat percentiles (usec):
     |  1.00th=[  334],  5.00th=[  338], 10.00th=[  338], 20.00th=[  338],
     | 30.00th=[  343], 40.00th=[  343], 50.00th=[  343], 60.00th=[  347],
     | 70.00th=[  347], 80.00th=[  351], 90.00th=[  371], 95.00th=[  429],
     | 99.00th=[  494], 99.50th=[  562], 99.90th=[  676], 99.95th=[  807],
     | 99.99th=[ 1860]
    lat percentiles (usec):
     |  1.00th=[  334],  5.00th=[  338], 10.00th=[  338], 20.00th=[  343],
     | 30.00th=[  343], 40.00th=[  343], 50.00th=[  347], 60.00th=[  347],
     | 70.00th=[  347], 80.00th=[  351], 90.00th=[  371], 95.00th=[  429],
     | 99.00th=[  494], 99.50th=[  562], 99.90th=[  676], 99.95th=[  807],
     | 99.99th=[ 1860]
   bw (  KiB/s): min=11040, max=11352, per=100.00%, avg=11259.35, stdev=60.98, samples=40
   iops        : min= 2760, max= 2838, avg=2814.77, stdev=15.23, samples=40
  lat (usec)   : 500=99.22%, 750=0.71%, 1000=0.05%
  lat (msec)   : 2=0.02%, 4=0.01%
  cpu          : usr=0.54%, sys=0.43%, ctx=56291, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,56284,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=11.0MiB/s (11.5MB/s), 11.0MiB/s-11.0MiB/s (11.5MB/s-11.5MB/s), io=220MiB (231MB), run=20001-20001msec

Disk stats (read/write):
  vdb: ios=0/65290, merge=0/2898, ticks=0/22339, in_queue=22346, util=99.61%

Platform 5

Fedora39

Intel12700K + Solidigm D5-P5430

fio --name=random_write_latency_test --filename=rand_write_test --filesize=10G --time_based=1 --ramp_time=30s --runtime=1m --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 --bs=4k --iodepth=1 --rw=randwrite -norandommap=1 --lat_percentiles=1 --thread=1 --numjobs=1--group_reporting
random_write_latency_test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.35
Starting 1 thread
Jobs: 1 (f=1): [w(1)][100.0%][w=399MiB/s][w=102k IOPS][eta 00m:00s]
random_write_latency_test: (groupid=0, jobs=1): err= 0: pid=21512: Sun Apr 21 01:56:32 2024
  write: IOPS=101k, BW=395MiB/s (414MB/s)(23.2GiB/60001msec); 0 zone resets
    slat (nsec): min=779, max=72545, avg=928.41, stdev=134.39
    clat (nsec): min=1923, max=4005.2k, avg=8782.28, stdev=4782.13
     lat (usec): min=8, max=4007, avg= 9.71, stdev= 4.79
    clat percentiles (nsec):
     |  1.00th=[ 7904],  5.00th=[ 7968], 10.00th=[ 7968], 20.00th=[ 8032],
     | 30.00th=[ 8096], 40.00th=[ 8096], 50.00th=[ 8096], 60.00th=[ 8160],
     | 70.00th=[ 8160], 80.00th=[ 8160], 90.00th=[ 8256], 95.00th=[ 9792],
     | 99.00th=[31104], 99.50th=[32384], 99.90th=[40192], 99.95th=[44288],
     | 99.99th=[66048]
    lat percentiles (nsec):
     |  1.00th=[ 8768],  5.00th=[ 8896], 10.00th=[ 8896], 20.00th=[ 8896],
     | 30.00th=[ 9024], 40.00th=[ 9024], 50.00th=[ 9024], 60.00th=[ 9024],
     | 70.00th=[ 9152], 80.00th=[ 9152], 90.00th=[ 9280], 95.00th=[10688],
     | 99.00th=[32128], 99.50th=[33536], 99.90th=[40704], 99.95th=[45312],
     | 99.99th=[68096]
   bw (  KiB/s): min=398028, max=409696, per=100.00%, avg=404733.77, stdev=2282.72, samples=120
   iops        : min=99507, max=102424, avg=101183.42, stdev=570.68, samples=120
  lat (usec)   : 2=0.01%, 4=0.01%, 10=95.26%, 20=2.45%, 50=2.27%
  lat (usec)   : 100=0.02%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=4.11%, sys=14.94%, ctx=6070378, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,6070293,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=395MiB/s (414MB/s), 395MiB/s-395MiB/s (414MB/s-414MB/s), io=23.2GiB (24.9GB), run=60001-60001msec

Disk stats (read/write):
  nvme2n1: ios=0/9052037, merge=0/0, ticks=0/65611, in_queue=65611, util=99.87%
popcnt1 commented 2 months ago

move build tx out of execution: https://github.com/rooch-network/rooch/pull/1390:

huge improvement(It's actually closer to the real server-side execution overhead) , the major cost is from tokio wake

RUST_LOG=error cargo bench --bench bench_tx_write
   Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks)
    Finished bench [optimized + debuginfo] target(s) in 12.05s
     Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-624c3ac78421ba7f)
execute_tx              time:   [388.70 µs 389.91 µs 391.12 µs]
                        change: [-97.069% -97.059% -97.049%] (p = 0.00 < 0.05)
                        Performance has improved.
popcnt1 commented 2 months ago

move build tx out of execution: #1390:

huge improvement(It's actually closer to the real server-side execution overhead) , the major cost is from tokio wake

RUST_LOG=error cargo bench --bench bench_tx_write
   Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks)
    Finished bench [optimized + debuginfo] target(s) in 12.05s
     Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-624c3ac78421ba7f)
execute_tx              time:   [388.70 µs 389.91 µs 391.12 µs]
                        change: [-97.069% -97.059% -97.049%] (p = 0.00 < 0.05)
                        Performance has improved.

but I cannot capture execute fn in flamegraph, that's weird. (only validate process)

popcnt1 commented 2 months ago

move build tx out of execution: #1390: huge improvement(It's actually closer to the real server-side execution overhead) , the major cost is from tokio wake

RUST_LOG=error cargo bench --bench bench_tx_write
   Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks)
    Finished bench [optimized + debuginfo] target(s) in 12.05s
     Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-624c3ac78421ba7f)
execute_tx              time:   [388.70 µs 389.91 µs 391.12 µs]
                        change: [-97.069% -97.059% -97.049%] (p = 0.00 < 0.05)
                        Performance has improved.

but I cannot capture execute fn in flamegraph, that's weird. (only validate process)

that's why no execution process only validation:

2024-02-24T09:13:52.654427Z DEBUG rooch_rpc_server::service::rpc_service: Failed to validate transaction: VMError with status LINKER_ERROR at location UNDEFINED and message Cannot find ModuleId { address: 893046799b1f3d94c73e4b5b7769e193fc0acbe8c149ee058b3df069ca745226, name: Identifier("simple_blog") } in data cache

popcnt1 commented 2 months ago

move build tx out of execution: #1390: huge improvement(It's actually closer to the real server-side execution overhead) , the major cost is from tokio wake

RUST_LOG=error cargo bench --bench bench_tx_write
   Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks)
    Finished bench [optimized + debuginfo] target(s) in 12.05s
     Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-624c3ac78421ba7f)
execute_tx              time:   [388.70 µs 389.91 µs 391.12 µs]
                        change: [-97.069% -97.059% -97.049%] (p = 0.00 < 0.05)
                        Performance has improved.

but I cannot capture execute fn in flamegraph, that's weird. (only validate process)

fixed link error https://github.com/rooch-network/rooch/pull/1392

move build tx out of execution: #1390: huge improvement(It's actually closer to the real server-side execution overhead) , the major cost is from tokio wake

RUST_LOG=error cargo bench --bench bench_tx_write
   Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks)
    Finished bench [optimized + debuginfo] target(s) in 12.05s
     Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-624c3ac78421ba7f)
execute_tx              time:   [388.70 µs 389.91 µs 391.12 µs]
                        change: [-97.069% -97.059% -97.049%] (p = 0.00 < 0.05)
                        Performance has improved.

but I cannot capture execute fn in flamegraph, that's weird. (only validate process)

that's why no execution process only validation:

2024-02-24T09:13:52.654427Z DEBUG rooch_rpc_server::service::rpc_service: Failed to validate transaction: VMError with status LINKER_ERROR at location UNDEFINED and message Cannot find ModuleId { address: 893046799b1f3d94c73e4b5b7769e193fc0acbe8c149ee058b3df069ca745226, name: Identifier("simple_blog") } in data cache

fixed in: https://github.com/rooch-network/rooch/pull/1392

RUST_LOG=error cargo bench --bench bench_tx_write
   Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks)
    Finished bench [optimized + debuginfo] target(s) in 13.62s
     Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-624c3ac78421ba7f)
INCLUDING DEPENDENCY MoveStdlib
INCLUDING DEPENDENCY MoveosStdlib
INCLUDING DEPENDENCY RoochFramework
BUILDING simple_blog
execute_tx              time:   [5.5210 ms 6.0733 ms 6.3923 ms]
                        change: [-12.752% -2.5564% +8.6448%] (p = 0.66 > 0.05)
                        No change in performance detected.
popcnt1 commented 2 months ago

with mem fs:

RUST_LOG=error PUREMEM=/Volumes/RAMDisk/rtmp cargo bench --bench bench_tx_write
   Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks)
warning: unused import: `std::sync::Arc`
  --> crates/rooch-benchmarks/src/tx.rs:44:5
   |
44 | use std::sync::Arc;
   |     ^^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

warning: unused import: `tempfile::TempDir`
  --> crates/rooch-benchmarks/src/tx.rs:46:5
   |
46 | use tempfile::TempDir;
   |     ^^^^^^^^^^^^^^^^^

warning: `rooch-benchmarks` (lib) generated 2 warnings (run `cargo fix --lib -p rooch-benchmarks` to apply 2 suggestions)
    Finished bench [optimized + debuginfo] target(s) in 13.26s
     Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-9809035715c4c70d)
INCLUDING DEPENDENCY MoveStdlib
INCLUDING DEPENDENCY MoveosStdlib
INCLUDING DEPENDENCY RoochFramework
BUILDING simple_blog
execute_tx              time:   [5.1027 ms 5.6767 ms 5.9972 ms]
                        change: [-13.073% -0.8050% +12.977%] (p = 0.91 > 0.05)
                        No change in performance detected.
popcnt1 commented 2 months ago

run bench with tx_data_size = 0, no significant improvement

RUST_LOG=error TX_SIZE=0 PUREMEM=/Volumes/RAMDisk/rtmp cargo bench --bench bench_tx_write
    Finished bench [optimized + debuginfo] target(s) in 1m 37s
     Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-9809035715c4c70d)
INCLUDING DEPENDENCY MoveStdlib
INCLUDING DEPENDENCY MoveosStdlib
INCLUDING DEPENDENCY RoochFramework
BUILDING simple_blog
execute_tx              time:   [5.0727 ms 5.6405 ms 5.9602 ms]
                        change: [-12.465% -0.5425% +11.901%] (p = 0.93 > 0.05)
                        No change in performance detected.
popcnt1 commented 2 months ago

256 bytes tx data size with sync option for rocksdb:

RUST_LOG=error TX_SIZE=256 cargo bench --bench bench_tx_write
execute_tx              time:   [5.5854 ms 6.1576 ms 6.5102 ms]
                        change: [-35.667% -15.056% +14.721%] (p = 0.38 > 0.05)
                        No change in performance detected.

256 bytes tx data size with no indexer, sync for rocksdb::

RUST_LOG=error TX_SIZE=256 cargo bench --bench bench_tx_write

execute_tx              time:   [2.6807 ms 3.0280 ms 3.2322 ms]
                        change: [-5.5588% +9.1633% +27.002%] (p = 0.29 > 0.05)
                        No change in performance detected.

256 bytes tx data size with no indexer, no sync for rocksdb:

RUST_LOG=error TX_SIZE=256 cargo bench --bench bench_tx_write
execute_tx              time:   [2.4587 ms 2.8583 ms 3.0955 ms]
                        change: [+11.088% +47.174% +123.71%] (p = 0.02 < 0.05)
                        Performance has regressed.
popcnt1 commented 2 months ago

without indexer updates

transfer tx:

RUST_LOG=error TX_TYPE=transfer cargo bench --bench bench_tx_write
execute_tx              time:   [3.0738 ms 3.1135 ms 3.1530 ms]
                        change: [+0.6383% +8.0757% +19.114%] (p = 0.14 > 0.05)
                        No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

transfer tx with mem filesystem:

RUST_LOG=error DATA_DIR=/volumes/RAMDisk TX_TYPE=transfer cargo bench --bench bench_tx_write
execute_tx              time:   [2.8923 ms 2.9551 ms 2.9870 ms]
                        change: [-8.7169% -6.1064% -3.4318%] (p = 0.00 < 0.05)
                        Performance has improved.

empty tx:

RUST_LOG=error TX_TYPE=empty cargo bench --bench bench_tx_write
   Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks)
    Finished bench [optimized + debuginfo] target(s) in 13.77s
     Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-b8caaaf38acbd4cd)
execute_tx              time:   [1.6870 ms 1.7017 ms 1.7141 ms]
                        change: [+7.7499% +8.5624% +9.4147%] (p = 0.00 < 0.05)
                        Performance has regressed.

empty tx with mem filesystem:

RUST_LOG=error DATA_DIR=/volumes/RAMDisk TX_TYPE=empty cargo bench --bench bench_tx_write
   Compiling rooch-benchmarks v0.1.0 (/Users/templex/rooch/bodhi/rooch/crates/rooch-benchmarks)
    Finished bench [optimized + debuginfo] target(s) in 15.92s
     Running benches/bench_tx_write.rs (/Users/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-b8caaaf38acbd4cd)
execute_tx              time:   [1.5552 ms 1.5570 ms 1.5580 ms]
                        change: [-1.5589% +0.0792% +2.9995%] (p = 0.95 > 0.05)
                        No change in performance detected.
Found 3 outliers among 10 measurements (30.00%)
  2 (20.00%) low severe
  1 (10.00%) high severe
popcnt1 commented 2 months ago

for no_sync rocksdb (put & write_batch), I log rocksdb operation's cost and using script to output this:

./get_time.sh
Total put operations: 8016
Total time elapsed in put operations (in ns): 49929940
Average time per put operation (in ns): 6228.78
Total bytes in put operations: 1431756
Average bytes per put operation: 178.612
Average number of put operations per execute_tx: 5.00062
Average time of put operations per execute_tx (in ns): 31147.8
---------
Total get operations: 128334
Total time elapsed in get operations (in ns): 103305590
Average time per get operation (in ns): 804.974
Total bytes in get operations: 43787172
Average bytes per get operation: 341.197
Average number of get operations per execute_tx: 80.0586
Average time of get operations per execute_tx (in ns): 64445.2
---------
Total write_batch operations: 4834
Total time elapsed in write_batch operations (in ns): 3.53812e+07
Average time per write_batch operation (in ns): 7319.23
Total bytes in write_batch operations: 865968
Average bytes per write_batch operation: 179.141
Average number of write_batch operations per execute_tx: 3.0156
Average time of write_batch operations per execute_tx (in ns): 22071.8
---------
popcnt1 commented 2 months ago

for no_sync rocksdb (put & write_batch), I log rocksdb operation's cost and using script to output this (187us/tx):

./get_time.sh
Total put operations: 8016
Total time elapsed in put operations (in µs): 44234.4
Average time per put operation (in µs): 5.51827
Total bytes in put operations: 1431756
Average bytes per put operation: 178.612
Average number of put operations per execute_tx: 5.00062
Average time of put operations per execute_tx (in µs): 27.5948
---------
Total get operations: 128334
Total time elapsed in get operations (in µs): 103229
Average time per get operation (in µs): 0.80438
Total bytes in get operations: 43787172
Average bytes per get operation: 341.197
Average number of get operations per execute_tx: 80.0586
Average time of get operations per execute_tx (in µs): 64.3976
---------
Total write_batch operations: 4834
Total time elapsed in write_batch operations (in µs): 39028
Average time per write_batch operation (in µs): 8.07365
Total bytes in write_batch operations: 865968
Average bytes per write_batch operation: 179.141
Average number of write_batch operations per execute_tx: 3.0156
Average time of write_batch operations per execute_tx (in µs): 24.3469
---------
Total put_sync operations: 3209
Total time elapsed in put_sync operations (in µs): 116106
Average time per put_sync operation (in µs): 36.1815
Total bytes in put_sync operations: 167014
Average bytes per put_sync operation: 52.0455
Average number of put_sync operations per execute_tx: 2.00187
Average time of put_sync operations per execute_tx (in µs): 72.4307
---------
popcnt1 commented 2 months ago

persistence optimization log

  1. remove unnecessary sync write
  2. combine sync write

before persistence optimization:

RUST_LOG=error TX_TYPE=empty cargo bench --bench bench_tx_write
    Finished bench [optimized] target(s) in 0.73s
     Running benches/bench_tx_write.rs (/Users/templex/rooch/subodhi/rooch/target/release/deps/bench_tx_write-9a00dff23725eca7)
execute_tx              time:   [1.9076 ms 1.9201 ms 1.9316 ms]
                        change: [-2.3943% -0.6263% +0.8086%] (p = 0.53 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
RUST_LOG=error TX_TYPE=transfer cargo bench --bench bench_tx_write
    Finished bench [optimized] target(s) in 0.39s
     Running benches/bench_tx_write.rs (/Users/templex/rooch/subodhi/rooch/target/release/deps/bench_tx_write-9a00dff23725eca7)
execute_tx              time:   [3.0622 ms 3.1346 ms 3.1687 ms]
                        change: [-5.9144% -2.1965% +1.5103%] (p = 0.30 > 0.05)
                        No change in performance detected.

Optimizing

after change default write options: set_sync -> false

RUST_LOG=error TX_TYPE=transfer cargo bench --bench bench_tx_write
execute_tx              time:   [2.8585 ms 2.9698 ms 3.0495 ms]
                        change: [-7.9807% -1.4107% +4.2084%] (p = 0.72 > 0.05)
                        No change in performance detected.

rocksdb v0.21.0 -> v0.22.0 and rm multithread cf:

    Finished bench [optimized] target(s) in 0.41s
     Running benches/bench_tx_write.rs (/Users/templex/rooch/subodhi/rooch/target/release/deps/bench_tx_write-892d2f3518007bf1)
execute_tx              time:   [2.8199 ms 2.8966 ms 2.9364 ms]
                        change: [-4.1880% -1.1769% +2.1003%] (p = 0.50 > 0.05)
                        No change in performance detected.
popcnt1 commented 1 month ago

on platform2:

RUST_LOG=error TX_TYPE=transfer cargo bench --bench bench_tx_write
execute_tx              time:   [2.8009 ms 2.8532 ms 2.8790 ms]
                        change: [-3.1362% -0.7240% +1.6901%] (p = 0.58 > 0.05)
                        No change in performance detected.
popcnt1 commented 1 month ago

buf write nodes when apply change_change_set(https://github.com/popcnt-subodhi/rooch/commit/660cd43f20d7583a05d77b178207f62d4eb0734f):

on platform2:

RUST_LOG=error TX_TYPE=transfer cargo bench --bench bench_tx_write
    Finished bench [optimized] target(s) in 0.30s
     Running benches/bench_tx_write.rs (/home/templex/rooch/bodhi/rooch/target/release/deps/bench_tx_write-f73555ca8f703ae2)
execute_tx              time:   [2.7433 ms 2.7940 ms 2.8186 ms]
                        change: [-2.9591% -0.2739% +2.1993%] (p = 0.85 > 0.05)
                        No change in performance detected.
popcnt1 commented 1 month ago

execute l1 block tx for btc block after 800,000(height):

data import mode avg cost
none 4.4906s
ord 5.7325s
utxo 4.4913s

for btc block after 200,000(height) is 132.46ms:

building profile effect:

  1. overflow-checks = false codegen-units = 1
tx_exec_blk             time:   [4.0600 s 4.4255 s 4.8032 s]
                        change: [-13.149% -1.5229% +10.922%] (p = 0.81 > 0.05)
                        No change in performance detected.
  1. overflow-checks = false codegen-units = 1 lto="thin"
tx_exec_blk             time:   [3.5628 s 3.9052 s 4.2501 s]
                        change: [-22.557% -11.757% -0.5718%] (p = 0.07 > 0.05)
                        No change in performance detected.
  1. overflow-checks = false codegen-units = 256 lto="thin"
tx_exec_blk             time:   [3.7183 s 4.0718 s 4.4269 s]
                        change: [-7.5759% +4.2659% +18.680%] (p = 0.54 > 0.05)
                        No change in performance detected.
popcnt1 commented 1 month ago
bench avg cost
exec empty 1.0313ms
exec transfer 1.5611ms
exec btc block (>= 800,000 height) 2.384s