yomaytk / elfconv

An experimental AOT compiler that translates Linux ELF binary to WebAssembly
Apache License 2.0
160 stars 8 forks source link

performance (aarch64 -> x86-64) #15

Open yomaytk opened 7 months ago

yomaytk commented 7 months ago

commit: 1d92b7ea2c5bea998041ca6244b3571ff59855a7

Array size: 100
Memory required:  79K.

LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
       1   0.00  78.19%   5.78%  16.04%  77929.716
       2   0.01  79.14%   5.78%  15.08%  77366.616
       4   0.01  78.02%   5.69%  16.29%  77062.886
       8   0.02  78.17%   5.84%  15.99%  83402.179
      16   0.03  79.13%   5.78%  15.09%  101010.101
      32   0.06  78.46%   5.71%  15.82%  110786.677
      64   0.12  78.73%   5.82%  15.46%  114954.215
     128   0.23  79.23%   5.80%  14.97%  114825.215
     256   0.47  78.65%   5.94%  15.41%  114702.031
     512   0.94  78.80%   5.89%  15.31%  114201.400
    1024   1.86  78.85%   5.87%  15.27%  114689.887
    2048   3.73  78.78%   5.92%  15.31%  114530.256
    4096   7.42  78.78%   5.89%  15.33%  115186.897
    8192  14.87  78.79%   5.87%  15.34%  114951.238
yomaytk commented 7 months ago

commit: f8b59de36ae69ac4e998b7ef36c2302f35e0d17d

Array size: 100
Memory required:  79K.

LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
       1   0.00  82.00%   5.09%  12.91%  99195.209
       2   0.00  83.27%   5.34%  11.39%  105409.706
       4   0.01  83.66%   5.31%  11.03%  106698.878
       8   0.01  84.45%   5.19%  10.36%  106642.521
      16   0.03  84.29%   5.30%  10.41%  111049.999
      32   0.04  83.62%   5.60%  10.77%  145967.811
      64   0.08  83.88%   5.44%  10.68%  152222.985
     128   0.17  83.88%   5.41%  10.71%  151117.229
     256   0.35  84.06%   5.37%  10.57%  144163.442
     512   0.71  84.02%   5.30%  10.68%  142752.381
    1024   1.38  84.04%   5.35%  10.61%  146869.749
    2048   2.88  83.81%   5.49%  10.70%  140875.488
    4096   5.16  84.00%   5.36%  10.65%  156850.471
    8192  10.41  84.03%   5.33%  10.63%  155571.833
yomaytk commented 7 months ago

commit: ca277bb

Array size: 100
Memory required:  79K.

LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
       1   0.00  82.56%   5.23%  12.21%  233995.585
       2   0.00  84.46%   5.26%  10.28%  232762.407
       4   0.00  84.37%   5.30%  10.33%  234695.007
       8   0.01  84.53%   5.24%  10.23%  234383.637
      16   0.01  84.52%   5.24%  10.25%  234909.554
      32   0.03  84.44%   5.26%  10.30%  235026.745
      64   0.05  84.50%   5.23%  10.27%  234924.197
     128   0.11  84.49%   5.23%  10.28%  235092.717
     256   0.21  84.51%   5.22%  10.27%  234968.135
     512   0.43  84.51%   5.21%  10.28%  235031.631
    1024   0.86  84.50%   5.22%  10.28%  235001.405
    2048   1.72  84.50%   5.22%  10.28%  235005.832
    4096   3.43  84.51%   5.22%  10.27%  234966.991
    8192   6.86  84.51%   5.22%  10.27%  234955.204
   16384  13.73  84.51%   5.22%  10.27%  234963.062
yomaytk commented 5 months ago

commit: 6522dea

target: examples/mnist-neural-network-plain-c (fix STEP_SIZE from 1000 to 20)

QEMU (aarch64 on x86-64)

real    0m50.277s
user    0m50.179s
sys 0m0.048s

AOT compile (aarch64 to x86-64)

real    1m5.077s
user    1m4.913s
sys 0m0.076s
yomaytk commented 5 months ago

commit: 229e118

target: Program that calculates the max prime number less than the specified positive number

Case 1. 10,000,000

QEMU (aarch64 on x86-64)

real    0m9.059s
user    0m9.013s
sys 0m0.028s

AOT compile (aarch64 to x86-64)

real    0m5.927s
user    0m5.914s
sys 0m0.001s

Case 2: 50,000,000

QEMU (aarch64 on x86-64)

real    1m23.709s
user    1m23.660s
sys 0m0.004s

AOT compile (aarch64 to x86-64)

real    0m57.401s
user    0m57.297s
sys 0m0.008s
yomaytk commented 2 weeks ago

commit: 5f29035f51fec8fc2851019095cb149c73a3b4a3

LINPACK benchmark test.

QEMU

Array size: 100
Memory required:  79K.
LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
       1   0.00  66.21%  16.80%  16.99%  103616.813
       2   0.00  81.45%   5.35%  13.20%  231693.989
       4   0.00  82.00%   5.24%  12.76%  231846.019
       8   0.01  81.91%   5.28%  12.80%  232647.462
      16   0.01  81.94%   5.17%  12.89%  233493.034
      32   0.03  81.95%   5.20%  12.84%  234432.234
      64   0.06  81.94%   5.16%  12.89%  234315.635
     128   0.11  82.16%   5.17%  12.66%  229742.589
     256   0.22  81.96%   5.15%  12.88%  234160.350
     512   0.44  81.96%   5.19%  12.85%  234100.354
    1024   0.89  81.97%   5.18%  12.86%  234009.206
    2048   1.77  81.97%   5.18%  12.85%  233958.061
    4096   3.55  81.99%   5.17%  12.84%  233579.784
    8192   7.10  81.96%   5.17%  12.87%  233918.242
   16384  14.20  81.96%   5.17%  12.87%  233981.758

elfconv

Array size: 100
Memory required:  79K.
LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
       1   0.00  79.94%   5.67%  14.40%  316040.549
       2   0.00  82.30%   5.59%  12.12%  316323.486
       4   0.00  82.31%   5.55%  12.14%  321212.121
       8   0.01  82.26%   5.49%  12.24%  320629.159
      16   0.01  82.45%   5.47%  12.08%  320847.522
      32   0.02  82.39%   5.40%  12.21%  322439.590
      64   0.04  82.45%   5.41%  12.14%  321696.494
     128   0.08  82.48%   5.42%  12.10%  320997.819
     256   0.16  82.48%   5.43%  12.09%  320599.612
     512   0.32  82.45%   5.43%  12.11%  321153.958
    1024   0.64  82.47%   5.41%  12.12%  321338.785
    2048   1.28  82.45%   5.41%  12.14%  321598.702
    4096   2.56  82.46%   5.41%  12.13%  321493.685
    8192   5.12  82.46%   5.40%  12.14%  321946.676
   16384  10.24  82.45%   5.41%  12.14%  321664.426
yomaytk commented 1 week ago

commit: 3f083b2fddc7db57e0ca1052435213d759e68929

LINPACK benchmark test.

elfconv (no optimized)

Array size: 100
Memory required:  79K.

LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
       1   0.00  82.23%   5.52%  12.24%  241678.067
       2   0.00  81.98%   5.20%  12.83%  259995.094
       4   0.00  84.34%   5.43%  10.23%  263977.089
       8   0.01  84.20%   5.48%  10.32%  278489.327
      16   0.01  84.39%   5.76%   9.85%  273319.152
      32   0.02  83.26%   5.61%  11.13%  267297.084
      64   0.05  85.02%   5.53%   9.45%  263368.351
     128   0.09  84.73%   5.31%   9.97%  275849.730
     256   0.18  84.71%   5.50%   9.78%  279862.791
     512   0.36  84.67%   5.31%  10.03%  280616.413
    1024   0.71  84.77%   5.35%   9.88%  281750.976
    2048   1.42  84.77%   5.37%   9.86%  282224.808
    4096   2.87  84.78%   5.34%   9.88%  280201.441
    8192   5.68  84.75%   5.38%   9.86%  282869.095
   16384  11.37  84.80%   5.36%   9.84%  282437.931

elfconv (optimized)

Array size: 100
Memory required:  79K.
LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:
    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
       1   0.00  84.05%   4.64%  11.31%  237136.465
       2   0.00  80.08%   8.93%  11.00%  256782.946
       4   0.00  85.41%   4.82%   9.77%  256782.946
       8   0.01  83.23%   6.30%  10.48%  273795.686
      16   0.01  83.67%   5.36%  10.97%  333529.990
      32   0.02  84.17%   5.10%  10.73%  304269.824
      64   0.04  83.84%   5.38%  10.79%  323435.742
     128   0.07  83.71%   5.33%  10.96%  349645.664
     256   0.14  83.55%   5.41%  11.04%  355452.161
     512   0.28  83.72%   5.55%  10.73%  359565.331
    1024   0.56  83.78%   5.43%  10.79%  360711.720
    2048   1.12  83.83%   5.37%  10.80%  362779.779
    4096   2.24  83.80%   5.36%  10.83%  362214.486
    8192   4.52  83.75%   5.36%  10.89%  359542.820
   16384   8.91  83.80%   5.37%  10.84%  364528.111
   32768  17.95  83.80%   5.37%  10.83%  361592.626
yomaytk commented 1 week ago

commit: 919e51d7419fa148b5c4c54e268b8a2c16f43854

wasm on browser by elfconv (x86-64 VM on AWS)

Array size: 100
Memory required:  79K.

LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
       1   0.00 100.00%   0.00%   0.00%  176637.356
       2   0.00 100.00%   0.00%   0.00%  176679.472
       4   0.00 100.00%   0.00%   0.00%  235553.908
       8   0.01  80.00%   0.00%  20.00%  353316.823
      16   0.01  76.92%   0.00%  23.08%  282660.197
      32   0.02  83.33%   0.00%  16.67%  282663.567
      64   0.05  80.43%   4.35%  15.22%  289914.275
     128   0.09  86.96%   6.52%   6.52%  262945.841
     256   0.19  84.74%   4.74%  10.53%  266037.977
     512   0.37  86.41%   5.16%   8.42%  268407.792
    1024   0.73  84.46%   3.85%  11.69%  281785.789
    2048   1.47  85.69%   5.52%   8.79%  270413.637
    4096   2.95  84.97%   5.39%   9.63%  271631.652
    8192   5.91  85.78%   5.28%   8.94%  268956.095
   16384  11.89  85.58%   5.06%   9.35%  268606.750

wasm on browser by container2wasm (x86-64 VM on AWS)

Array size: 100
Memory required:  79K.

LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
       1   0.16  70.53%   4.99%  24.49%   1427.609
       2   0.30  77.38%   4.52%  18.10%   1434.652
       4   0.59  76.85%   4.97%  18.19%   1464.503
       8   1.18  76.70%   4.94%  18.37%   1468.032
      16   2.35  77.00%   4.95%  18.04%   1465.631
      32   4.71  77.11%   4.93%  17.96%   1463.523
      64   9.64  76.93%   4.96%  18.11%   1432.674
     128  19.06  77.06%   4.97%  17.97%   1446.058