Open yomaytk opened 7 months ago
commit: f8b59de36ae69ac4e998b7ef36c2302f35e0d17d
Array size: 100
Memory required: 79K.
LINPACK benchmark, Double precision.
Machine precision: 15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:
Reps Time(s) DGEFA DGESL OVERHEAD KFLOPS
----------------------------------------------------
1 0.00 82.00% 5.09% 12.91% 99195.209
2 0.00 83.27% 5.34% 11.39% 105409.706
4 0.01 83.66% 5.31% 11.03% 106698.878
8 0.01 84.45% 5.19% 10.36% 106642.521
16 0.03 84.29% 5.30% 10.41% 111049.999
32 0.04 83.62% 5.60% 10.77% 145967.811
64 0.08 83.88% 5.44% 10.68% 152222.985
128 0.17 83.88% 5.41% 10.71% 151117.229
256 0.35 84.06% 5.37% 10.57% 144163.442
512 0.71 84.02% 5.30% 10.68% 142752.381
1024 1.38 84.04% 5.35% 10.61% 146869.749
2048 2.88 83.81% 5.49% 10.70% 140875.488
4096 5.16 84.00% 5.36% 10.65% 156850.471
8192 10.41 84.03% 5.33% 10.63% 155571.833
commit: ca277bb
Array size: 100
Memory required: 79K.
LINPACK benchmark, Double precision.
Machine precision: 15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:
Reps Time(s) DGEFA DGESL OVERHEAD KFLOPS
----------------------------------------------------
1 0.00 82.56% 5.23% 12.21% 233995.585
2 0.00 84.46% 5.26% 10.28% 232762.407
4 0.00 84.37% 5.30% 10.33% 234695.007
8 0.01 84.53% 5.24% 10.23% 234383.637
16 0.01 84.52% 5.24% 10.25% 234909.554
32 0.03 84.44% 5.26% 10.30% 235026.745
64 0.05 84.50% 5.23% 10.27% 234924.197
128 0.11 84.49% 5.23% 10.28% 235092.717
256 0.21 84.51% 5.22% 10.27% 234968.135
512 0.43 84.51% 5.21% 10.28% 235031.631
1024 0.86 84.50% 5.22% 10.28% 235001.405
2048 1.72 84.50% 5.22% 10.28% 235005.832
4096 3.43 84.51% 5.22% 10.27% 234966.991
8192 6.86 84.51% 5.22% 10.27% 234955.204
16384 13.73 84.51% 5.22% 10.27% 234963.062
commit: 6522dea
target: examples/mnist-neural-network-plain-c (fix STEP_SIZE from 1000 to 20)
QEMU (aarch64 on x86-64)
real 0m50.277s
user 0m50.179s
sys 0m0.048s
AOT compile (aarch64 to x86-64)
real 1m5.077s
user 1m4.913s
sys 0m0.076s
commit: 229e118
target: Program that calculates the max prime number less than the specified positive number
Case 1. 10,000,000
QEMU (aarch64 on x86-64)
real 0m9.059s
user 0m9.013s
sys 0m0.028s
AOT compile (aarch64 to x86-64)
real 0m5.927s
user 0m5.914s
sys 0m0.001s
Case 2: 50,000,000
QEMU (aarch64 on x86-64)
real 1m23.709s
user 1m23.660s
sys 0m0.004s
AOT compile (aarch64 to x86-64)
real 0m57.401s
user 0m57.297s
sys 0m0.008s
commit: 5f29035f51fec8fc2851019095cb149c73a3b4a3
LINPACK benchmark test.
QEMU
Array size: 100
Memory required: 79K.
LINPACK benchmark, Double precision.
Machine precision: 15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:
Reps Time(s) DGEFA DGESL OVERHEAD KFLOPS
----------------------------------------------------
1 0.00 66.21% 16.80% 16.99% 103616.813
2 0.00 81.45% 5.35% 13.20% 231693.989
4 0.00 82.00% 5.24% 12.76% 231846.019
8 0.01 81.91% 5.28% 12.80% 232647.462
16 0.01 81.94% 5.17% 12.89% 233493.034
32 0.03 81.95% 5.20% 12.84% 234432.234
64 0.06 81.94% 5.16% 12.89% 234315.635
128 0.11 82.16% 5.17% 12.66% 229742.589
256 0.22 81.96% 5.15% 12.88% 234160.350
512 0.44 81.96% 5.19% 12.85% 234100.354
1024 0.89 81.97% 5.18% 12.86% 234009.206
2048 1.77 81.97% 5.18% 12.85% 233958.061
4096 3.55 81.99% 5.17% 12.84% 233579.784
8192 7.10 81.96% 5.17% 12.87% 233918.242
16384 14.20 81.96% 5.17% 12.87% 233981.758
elfconv
Array size: 100
Memory required: 79K.
LINPACK benchmark, Double precision.
Machine precision: 15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:
Reps Time(s) DGEFA DGESL OVERHEAD KFLOPS
----------------------------------------------------
1 0.00 79.94% 5.67% 14.40% 316040.549
2 0.00 82.30% 5.59% 12.12% 316323.486
4 0.00 82.31% 5.55% 12.14% 321212.121
8 0.01 82.26% 5.49% 12.24% 320629.159
16 0.01 82.45% 5.47% 12.08% 320847.522
32 0.02 82.39% 5.40% 12.21% 322439.590
64 0.04 82.45% 5.41% 12.14% 321696.494
128 0.08 82.48% 5.42% 12.10% 320997.819
256 0.16 82.48% 5.43% 12.09% 320599.612
512 0.32 82.45% 5.43% 12.11% 321153.958
1024 0.64 82.47% 5.41% 12.12% 321338.785
2048 1.28 82.45% 5.41% 12.14% 321598.702
4096 2.56 82.46% 5.41% 12.13% 321493.685
8192 5.12 82.46% 5.40% 12.14% 321946.676
16384 10.24 82.45% 5.41% 12.14% 321664.426
commit: 3f083b2fddc7db57e0ca1052435213d759e68929
LINPACK benchmark test.
elfconv (no optimized)
Array size: 100
Memory required: 79K.
LINPACK benchmark, Double precision.
Machine precision: 15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:
Reps Time(s) DGEFA DGESL OVERHEAD KFLOPS
----------------------------------------------------
1 0.00 82.23% 5.52% 12.24% 241678.067
2 0.00 81.98% 5.20% 12.83% 259995.094
4 0.00 84.34% 5.43% 10.23% 263977.089
8 0.01 84.20% 5.48% 10.32% 278489.327
16 0.01 84.39% 5.76% 9.85% 273319.152
32 0.02 83.26% 5.61% 11.13% 267297.084
64 0.05 85.02% 5.53% 9.45% 263368.351
128 0.09 84.73% 5.31% 9.97% 275849.730
256 0.18 84.71% 5.50% 9.78% 279862.791
512 0.36 84.67% 5.31% 10.03% 280616.413
1024 0.71 84.77% 5.35% 9.88% 281750.976
2048 1.42 84.77% 5.37% 9.86% 282224.808
4096 2.87 84.78% 5.34% 9.88% 280201.441
8192 5.68 84.75% 5.38% 9.86% 282869.095
16384 11.37 84.80% 5.36% 9.84% 282437.931
elfconv (optimized)
Array size: 100
Memory required: 79K.
LINPACK benchmark, Double precision.
Machine precision: 15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:
Reps Time(s) DGEFA DGESL OVERHEAD KFLOPS
----------------------------------------------------
1 0.00 84.05% 4.64% 11.31% 237136.465
2 0.00 80.08% 8.93% 11.00% 256782.946
4 0.00 85.41% 4.82% 9.77% 256782.946
8 0.01 83.23% 6.30% 10.48% 273795.686
16 0.01 83.67% 5.36% 10.97% 333529.990
32 0.02 84.17% 5.10% 10.73% 304269.824
64 0.04 83.84% 5.38% 10.79% 323435.742
128 0.07 83.71% 5.33% 10.96% 349645.664
256 0.14 83.55% 5.41% 11.04% 355452.161
512 0.28 83.72% 5.55% 10.73% 359565.331
1024 0.56 83.78% 5.43% 10.79% 360711.720
2048 1.12 83.83% 5.37% 10.80% 362779.779
4096 2.24 83.80% 5.36% 10.83% 362214.486
8192 4.52 83.75% 5.36% 10.89% 359542.820
16384 8.91 83.80% 5.37% 10.84% 364528.111
32768 17.95 83.80% 5.37% 10.83% 361592.626
commit: 919e51d7419fa148b5c4c54e268b8a2c16f43854
wasm on browser by elfconv (x86-64 VM on AWS)
Array size: 100
Memory required: 79K.
LINPACK benchmark, Double precision.
Machine precision: 15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:
Reps Time(s) DGEFA DGESL OVERHEAD KFLOPS
----------------------------------------------------
1 0.00 100.00% 0.00% 0.00% 176637.356
2 0.00 100.00% 0.00% 0.00% 176679.472
4 0.00 100.00% 0.00% 0.00% 235553.908
8 0.01 80.00% 0.00% 20.00% 353316.823
16 0.01 76.92% 0.00% 23.08% 282660.197
32 0.02 83.33% 0.00% 16.67% 282663.567
64 0.05 80.43% 4.35% 15.22% 289914.275
128 0.09 86.96% 6.52% 6.52% 262945.841
256 0.19 84.74% 4.74% 10.53% 266037.977
512 0.37 86.41% 5.16% 8.42% 268407.792
1024 0.73 84.46% 3.85% 11.69% 281785.789
2048 1.47 85.69% 5.52% 8.79% 270413.637
4096 2.95 84.97% 5.39% 9.63% 271631.652
8192 5.91 85.78% 5.28% 8.94% 268956.095
16384 11.89 85.58% 5.06% 9.35% 268606.750
wasm on browser by container2wasm (x86-64 VM on AWS)
Array size: 100
Memory required: 79K.
LINPACK benchmark, Double precision.
Machine precision: 15 digits.
Array size 100 X 100.
Average rolled and unrolled performance:
Reps Time(s) DGEFA DGESL OVERHEAD KFLOPS
----------------------------------------------------
1 0.16 70.53% 4.99% 24.49% 1427.609
2 0.30 77.38% 4.52% 18.10% 1434.652
4 0.59 76.85% 4.97% 18.19% 1464.503
8 1.18 76.70% 4.94% 18.37% 1468.032
16 2.35 77.00% 4.95% 18.04% 1465.631
32 4.71 77.11% 4.93% 17.96% 1463.523
64 9.64 76.93% 4.96% 18.11% 1432.674
128 19.06 77.06% 4.97% 17.97% 1446.058
commit: 1d92b7ea2c5bea998041ca6244b3571ff59855a7