Closed WrathfulSpatula closed 2 years ago
I'm on an Alienware m17 laptop running Ubuntu, and I can get you any other information you need about my hardware, but it looks like some of this is encapsulated in the benchmark run header. (The OpenCL information is a banner that Qrack itself prints.) The X gate benchmark is handled differently from the other gates in the original PR by @codewithsk, and I'm not sure if that's by design intention, but these are my results, exactly as the state of the PR is as of now:
2021-12-30T22:11:29-05:00
Running ./benchmarks
Run on (16 X 5300 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x8)
L1 Instruction 32 KiB (x8)
L2 Unified 256 KiB (x8)
L3 Unified 16384 KiB (x1)
Load Average: 0.64, 0.95, 1.00
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
Device #0, Loaded binary from: /home/iamu/.qrack/qrack_ocl_dev_Intel(R)_Gen9_HD_Graphics_NEO.ir
Device #1, Loaded binary from: /home/iamu/.qrack/qrack_ocl_dev_NVIDIA_GeForce_RTX_3080_Laptop_GPU.ir
Default platform: NVIDIA CUDA
Default device: NVIDIA GeForce RTX 3080 Laptop GPU
OpenCL device #0: Intel(R) Gen9 HD Graphics NEO
OpenCL device #1: NVIDIA GeForce RTX 3080 Laptop GPU
------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------
BM_sim_X/4 213259 ns 112948 ns 6278 X
BM_sim_X/5 213206 ns 112665 ns 6195 X
BM_sim_X/6 216765 ns 115058 ns 6133 X
BM_sim_X/7 221760 ns 116908 ns 6080 X
BM_sim_X/8 221702 ns 117289 ns 5847 X
BM_sim_X/9 241606 ns 128802 ns 5908 X
BM_sim_X/10 224935 ns 112945 ns 6217 X
BM_sim_X/11 208898 ns 206672 ns 3402 X
BM_sim_X/12 208507 ns 206667 ns 3337 X
BM_sim_X/13 206484 ns 204964 ns 3394 X
BM_sim_X/14 208688 ns 206654 ns 3281 X
BM_sim_X/15 208174 ns 206324 ns 3348 X
BM_sim_X/16 210380 ns 208174 ns 3355 X
BM_sim_X/17 218046 ns 216240 ns 3168 X
BM_sim_X/18 405332 ns 400874 ns 1748 X
BM_sim_X/19 564538 ns 533984 ns 1354 X
BM_sim_X/20 737173 ns 730341 ns 912 X
BM_sim_X/21 1188068 ns 1179324 ns 570 X
BM_sim_X/22 2057399 ns 2043418 ns 337 X
BM_sim_X/23 3757884 ns 3742428 ns 188 X
BM_sim_X/24 6873899 ns 6851575 ns 101 X
BM_sim_X/25 11484559 ns 11459152 ns 60 X
BM_sim_H/4 548 ns 491 ns 1084844 H
BM_sim_H/5 509 ns 449 ns 1379725 H
BM_sim_H/6 403 ns 362 ns 2034398 H
BM_sim_H/7 354 ns 327 ns 2248607 H
BM_sim_H/8 322 ns 306 ns 2340997 H
BM_sim_H/9 296 ns 286 ns 2415613 H
BM_sim_H/10 277 ns 273 ns 2555268 H
BM_sim_H/11 24422 ns 24419 ns 28658 H
BM_sim_H/12 24117 ns 24115 ns 28891 H
BM_sim_H/13 24613 ns 24610 ns 28469 H
BM_sim_H/14 24696 ns 24695 ns 27988 H
BM_sim_H/15 25724 ns 25722 ns 28385 H
BM_sim_H/16 25621 ns 25619 ns 27282 H
BM_sim_H/17 26688 ns 26685 ns 26094 H
BM_sim_H/18 38309 ns 38305 ns 14620 H
BM_sim_H/19 61861 ns 61859 ns 10369 H
BM_sim_H/20 211105 ns 211092 ns 3254 H
BM_sim_H/21 414356 ns 414331 ns 1696 H
BM_sim_H/22 800980 ns 800958 ns 820 H
BM_sim_H/23 1603827 ns 1603795 ns 432 H
BM_sim_H/24 3181423 ns 3181202 ns 220 H
BM_sim_H/25 6316012 ns 6315664 ns 115 H
BM_sim_T/4 596 ns 531 ns 1063853 T
BM_sim_T/5 586 ns 522 ns 1144787 T
BM_sim_T/6 429 ns 392 ns 1826865 T
BM_sim_T/7 374 ns 350 ns 1969759 T
BM_sim_T/8 354 ns 334 ns 2042546 T
BM_sim_T/9 322 ns 311 ns 2251085 T
BM_sim_T/10 305 ns 298 ns 2341433 T
BM_sim_T/11 23566 ns 23563 ns 29213 T
BM_sim_T/12 24272 ns 24269 ns 29684 T
BM_sim_T/13 23836 ns 23834 ns 29901 T
BM_sim_T/14 23814 ns 23812 ns 29398 T
BM_sim_T/15 25176 ns 25173 ns 27890 T
BM_sim_T/16 30957 ns 30945 ns 25974 T
BM_sim_T/17 38142 ns 38134 ns 18290 T
BM_sim_T/18 52379 ns 52373 ns 11892 T
BM_sim_T/19 95812 ns 95807 ns 6910 T
BM_sim_T/20 369271 ns 369240 ns 1909 T
BM_sim_T/21 720412 ns 720366 ns 930 T
BM_sim_T/22 1425567 ns 1425561 ns 484 T
BM_sim_T/23 2823849 ns 2823788 ns 247 T
BM_sim_T/24 5614832 ns 5614501 ns 127 T
BM_sim_T/25 11145959 ns 11145146 ns 66 T
BM_sim_CNOT/4 677 ns 605 ns 1000000 CNOT
BM_sim_CNOT/5 777 ns 714 ns 989520 CNOT
BM_sim_CNOT/6 575 ns 513 ns 1090034 CNOT
BM_sim_CNOT/7 422 ns 379 ns 1834157 CNOT
BM_sim_CNOT/8 378 ns 349 ns 1955417 CNOT
BM_sim_CNOT/9 341 ns 323 ns 2214292 CNOT
BM_sim_CNOT/10 308 ns 298 ns 2374219 CNOT
BM_sim_CNOT/11 25155 ns 25152 ns 27502 CNOT
BM_sim_CNOT/12 24841 ns 24838 ns 28230 CNOT
BM_sim_CNOT/13 25220 ns 25217 ns 28538 CNOT
BM_sim_CNOT/14 24525 ns 24521 ns 27621 CNOT
BM_sim_CNOT/15 24321 ns 24318 ns 27659 CNOT
BM_sim_CNOT/16 28957 ns 28954 ns 26956 CNOT
BM_sim_CNOT/17 26045 ns 26040 ns 26938 CNOT
BM_sim_CNOT/18 27007 ns 27003 ns 25367 CNOT
BM_sim_CNOT/19 49399 ns 49394 ns 12751 CNOT
BM_sim_CNOT/20 129419 ns 129411 ns 5307 CNOT
BM_sim_CNOT/21 236652 ns 236634 ns 2917 CNOT
BM_sim_CNOT/22 459842 ns 459781 ns 1531 CNOT
BM_sim_CNOT/23 901098 ns 901009 ns 753 CNOT
BM_sim_CNOT/24 1777652 ns 1777616 ns 395 CNOT
BM_sim_CNOT/25 3532641 ns 3532509 ns 199 CNOT
BM_sim_Toffoli/4 646 ns 596 ns 1420794 Toffoli
BM_sim_Toffoli/5 696 ns 640 ns 1119252 Toffoli
BM_sim_Toffoli/6 740 ns 680 ns 1280421 Toffoli
BM_sim_Toffoli/7 684 ns 599 ns 1137688 Toffoli
BM_sim_Toffoli/8 419 ns 379 ns 1783350 Toffoli
BM_sim_Toffoli/9 372 ns 346 ns 2058936 Toffoli
BM_sim_Toffoli/10 328 ns 313 ns 2225554 Toffoli
BM_sim_Toffoli/11 25722 ns 25706 ns 27219 Toffoli
BM_sim_Toffoli/12 25827 ns 25826 ns 27243 Toffoli
BM_sim_Toffoli/13 25751 ns 25747 ns 26671 Toffoli
BM_sim_Toffoli/14 25764 ns 25764 ns 27145 Toffoli
BM_sim_Toffoli/15 29746 ns 29744 ns 26828 Toffoli
BM_sim_Toffoli/16 25661 ns 25660 ns 26922 Toffoli
BM_sim_Toffoli/17 25719 ns 25717 ns 27324 Toffoli
BM_sim_Toffoli/18 26146 ns 26144 ns 26152 Toffoli
BM_sim_Toffoli/19 46794 ns 46789 ns 13846 Toffoli
BM_sim_Toffoli/20 102813 ns 102811 ns 6398 Toffoli
BM_sim_Toffoli/21 184932 ns 184913 ns 3685 Toffoli
BM_sim_Toffoli/22 344639 ns 344618 ns 2036 Toffoli
BM_sim_Toffoli/23 670496 ns 670439 ns 992 Toffoli
BM_sim_Toffoli/24 1329404 ns 1329354 ns 524 Toffoli
BM_sim_Toffoli/25 2601449 ns 2600936 ns 269 Toffoli
BM_sim_Rx/4 625 ns 554 ns 1183200 Rx
BM_sim_Rx/5 547 ns 488 ns 1201053 Rx
BM_sim_Rx/6 412 ns 369 ns 1822742 Rx
BM_sim_Rx/7 360 ns 332 ns 2108205 Rx
BM_sim_Rx/8 318 ns 305 ns 2273275 Rx
BM_sim_Rx/9 299 ns 291 ns 2393503 Rx
BM_sim_Rx/10 284 ns 280 ns 2430357 Rx
BM_sim_Rx/11 25235 ns 25232 ns 28303 Rx
BM_sim_Rx/12 24925 ns 24924 ns 43500 Rx
BM_sim_Rx/13 27080 ns 27069 ns 27805 Rx
BM_sim_Rx/14 25209 ns 25206 ns 28640 Rx
BM_sim_Rx/15 26155 ns 26153 ns 26852 Rx
BM_sim_Rx/16 25757 ns 25753 ns 26530 Rx
BM_sim_Rx/17 26784 ns 26781 ns 26146 Rx
BM_sim_Rx/18 38234 ns 38233 ns 18294 Rx
BM_sim_Rx/19 61962 ns 61954 ns 10220 Rx
BM_sim_Rx/20 222351 ns 222331 ns 3148 Rx
BM_sim_Rx/21 415150 ns 415083 ns 1686 Rx
BM_sim_Rx/22 820492 ns 820483 ns 817 Rx
BM_sim_Rx/23 1611268 ns 1611238 ns 426 Rx
BM_sim_Rx/24 3210008 ns 3209779 ns 221 Rx
BM_sim_Rx/25 6339580 ns 6338987 ns 113 Rx
BM_sim_Ry/4 675 ns 598 ns 1234713 Ry
BM_sim_Ry/5 740 ns 646 ns 1059282 Ry
BM_sim_Ry/6 447 ns 402 ns 1741965 Ry
BM_sim_Ry/7 388 ns 356 ns 1937756 Ry
BM_sim_Ry/8 338 ns 319 ns 2212755 Ry
BM_sim_Ry/9 314 ns 303 ns 2337792 Ry
BM_sim_Ry/10 296 ns 291 ns 2437109 Ry
BM_sim_Ry/11 27298 ns 27294 ns 28810 Ry
BM_sim_Ry/12 23882 ns 23879 ns 28938 Ry
BM_sim_Ry/13 24517 ns 24514 ns 29010 Ry
BM_sim_Ry/14 24373 ns 24370 ns 28657 Ry
BM_sim_Ry/15 24638 ns 24635 ns 28010 Ry
BM_sim_Ry/16 25001 ns 24998 ns 27480 Ry
BM_sim_Ry/17 27003 ns 27002 ns 26122 Ry
BM_sim_Ry/18 38464 ns 38459 ns 18384 Ry
BM_sim_Ry/19 62148 ns 62140 ns 10262 Ry
BM_sim_Ry/20 222616 ns 222586 ns 3145 Ry
BM_sim_Ry/21 415209 ns 415147 ns 1685 Ry
BM_sim_Ry/22 822340 ns 822322 ns 821 Ry
BM_sim_Ry/23 1616264 ns 1614425 ns 430 Ry
BM_sim_Ry/24 3212534 ns 3212245 ns 219 Ry
BM_sim_Ry/25 6340322 ns 6339099 ns 110 Ry
Something tells me these should all initialize the simulator outside of the loop and Finish()
before returning, for the desired comparison. (Qrack is mostly asynchronous, depending on the conditions, and Finish()
blocks for completion.)
Sorry to spam, but these results look more like expected, after c5c013a:
2021-12-30T22:34:50-05:00
Running ./benchmarks
Run on (16 X 5300 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x8)
L1 Instruction 32 KiB (x8)
L2 Unified 256 KiB (x8)
L3 Unified 16384 KiB (x1)
Load Average: 0.95, 0.74, 0.75
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
Device #0, Loaded binary from: /home/iamu/.qrack/qrack_ocl_dev_Intel(R)_Gen9_HD_Graphics_NEO.ir
Device #1, Loaded binary from: /home/iamu/.qrack/qrack_ocl_dev_NVIDIA_GeForce_RTX_3080_Laptop_GPU.ir
Default platform: NVIDIA CUDA
Default device: NVIDIA GeForce RTX 3080 Laptop GPU
OpenCL device #0: Intel(R) Gen9 HD Graphics NEO
OpenCL device #1: NVIDIA GeForce RTX 3080 Laptop GPU
------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------
BM_sim_X/4 73620 ns 36897 ns 19254 X
BM_sim_X/5 73509 ns 36964 ns 19097 X
BM_sim_X/6 73321 ns 36781 ns 18597 X
BM_sim_X/7 74625 ns 36009 ns 19213 X
BM_sim_X/8 80564 ns 40296 ns 16508 X
BM_sim_X/9 90610 ns 43907 ns 17100 X
BM_sim_X/10 87139 ns 38694 ns 18058 X
BM_sim_X/11 25241 ns 25239 ns 27651 X
BM_sim_X/12 25197 ns 25196 ns 27296 X
BM_sim_X/13 25019 ns 25016 ns 27704 X
BM_sim_X/14 25041 ns 25040 ns 27705 X
BM_sim_X/15 25400 ns 25397 ns 27269 X
BM_sim_X/16 26712 ns 26711 ns 26871 X
BM_sim_X/17 36641 ns 36640 ns 19033 X
BM_sim_X/18 57733 ns 57730 ns 13022 X
BM_sim_X/19 71597 ns 71586 ns 8977 X
BM_sim_X/20 220346 ns 220346 ns 3151 X
BM_sim_X/21 412093 ns 412045 ns 1705 X
BM_sim_X/22 790942 ns 790908 ns 825 X
BM_sim_X/23 1574414 ns 1574120 ns 437 X
BM_sim_X/24 3049004 ns 3048734 ns 224 X
BM_sim_X/25 6197424 ns 6196762 ns 105 X
BM_sim_H/4 72486 ns 36369 ns 19443 H
BM_sim_H/5 72338 ns 36019 ns 18792 H
BM_sim_H/6 79231 ns 38708 ns 19516 H
BM_sim_H/7 72443 ns 35189 ns 19887 H
BM_sim_H/8 74737 ns 35122 ns 20150 H
BM_sim_H/9 84327 ns 41577 ns 17572 H
BM_sim_H/10 86417 ns 38103 ns 18561 H
BM_sim_H/11 28224 ns 27913 ns 24093 H
BM_sim_H/12 30942 ns 30940 ns 17830 H
BM_sim_H/13 27166 ns 27165 ns 21197 H
BM_sim_H/14 34032 ns 34031 ns 26009 H
BM_sim_H/15 31409 ns 31408 ns 25939 H
BM_sim_H/16 36161 ns 36158 ns 18071 H
BM_sim_H/17 40351 ns 40350 ns 16937 H
BM_sim_H/18 51431 ns 51427 ns 13647 H
BM_sim_H/19 74466 ns 74466 ns 8001 H
BM_sim_H/20 243701 ns 243683 ns 2863 H
BM_sim_H/21 489477 ns 489474 ns 1552 H
BM_sim_H/22 881362 ns 881259 ns 794 H
BM_sim_H/23 1907669 ns 1907434 ns 381 H
BM_sim_H/24 3563037 ns 3560809 ns 192 H
BM_sim_H/25 7144112 ns 7143757 ns 94 H
BM_sim_T/4 72935 ns 36658 ns 19374 T
BM_sim_T/5 72518 ns 36258 ns 19062 T
BM_sim_T/6 72168 ns 35931 ns 19356 T
BM_sim_T/7 72583 ns 35302 ns 19824 T
BM_sim_T/8 78908 ns 37761 ns 20267 T
BM_sim_T/9 84283 ns 43088 ns 16116 T
BM_sim_T/10 84578 ns 38228 ns 18319 T
BM_sim_T/11 27394 ns 27392 ns 25584 T
BM_sim_T/12 27978 ns 27796 ns 25481 T
BM_sim_T/13 27966 ns 27948 ns 25204 T
BM_sim_T/14 27734 ns 27733 ns 25132 T
BM_sim_T/15 27846 ns 27830 ns 24761 T
BM_sim_T/16 38472 ns 38468 ns 18023 T
BM_sim_T/17 49993 ns 49957 ns 12264 T
BM_sim_T/18 68159 ns 68155 ns 10646 T
BM_sim_T/19 107429 ns 107357 ns 6183 T
BM_sim_T/20 381872 ns 381844 ns 1831 T
BM_sim_T/21 733143 ns 733052 ns 890 T
BM_sim_T/22 1450108 ns 1449967 ns 464 T
BM_sim_T/23 2857520 ns 2857273 ns 242 T
BM_sim_T/24 5679711 ns 5679333 ns 110 T
BM_sim_T/25 11353450 ns 11352924 ns 59 T
BM_sim_CNOT/4 71175 ns 35409 ns 19865 CNOT
BM_sim_CNOT/5 78812 ns 39786 ns 19291 CNOT
BM_sim_CNOT/6 71976 ns 36004 ns 19335 CNOT
BM_sim_CNOT/7 71894 ns 35786 ns 19648 CNOT
BM_sim_CNOT/8 72030 ns 35230 ns 19891 CNOT
BM_sim_CNOT/9 82921 ns 44106 ns 16039 CNOT
BM_sim_CNOT/10 84829 ns 43329 ns 16130 CNOT
BM_sim_CNOT/11 27472 ns 27471 ns 25737 CNOT
BM_sim_CNOT/12 27477 ns 27475 ns 18510 CNOT
BM_sim_CNOT/13 27448 ns 27445 ns 25470 CNOT
BM_sim_CNOT/14 27802 ns 27800 ns 25189 CNOT
BM_sim_CNOT/15 27631 ns 27630 ns 24541 CNOT
BM_sim_CNOT/16 27869 ns 27866 ns 25046 CNOT
BM_sim_CNOT/17 38741 ns 38731 ns 18111 CNOT
BM_sim_CNOT/18 38856 ns 38845 ns 17894 CNOT
BM_sim_CNOT/19 61823 ns 61797 ns 10270 CNOT
BM_sim_CNOT/20 142511 ns 142481 ns 4697 CNOT
BM_sim_CNOT/21 251819 ns 251788 ns 2779 CNOT
BM_sim_CNOT/22 470824 ns 470776 ns 1318 CNOT
BM_sim_CNOT/23 917129 ns 917085 ns 584 CNOT
BM_sim_CNOT/24 1799441 ns 1799201 ns 385 CNOT
BM_sim_CNOT/25 3568946 ns 3568409 ns 196 CNOT
BM_sim_Toffoli/4 70170 ns 34896 ns 19917 Toffoli
BM_sim_Toffoli/5 70068 ns 34889 ns 19548 Toffoli
BM_sim_Toffoli/6 70963 ns 35451 ns 19495 Toffoli
BM_sim_Toffoli/7 71479 ns 35846 ns 19370 Toffoli
BM_sim_Toffoli/8 78439 ns 38282 ns 19669 Toffoli
BM_sim_Toffoli/9 72227 ns 35228 ns 19933 Toffoli
BM_sim_Toffoli/10 73801 ns 35011 ns 20157 Toffoli
BM_sim_Toffoli/11 39220 ns 39217 ns 13228 Toffoli
BM_sim_Toffoli/12 38680 ns 38678 ns 19120 Toffoli
BM_sim_Toffoli/13 38529 ns 38529 ns 18850 Toffoli
BM_sim_Toffoli/14 38874 ns 38873 ns 17835 Toffoli
BM_sim_Toffoli/15 40048 ns 40046 ns 17933 Toffoli
BM_sim_Toffoli/16 50383 ns 50382 ns 10000 Toffoli
BM_sim_Toffoli/17 41631 ns 41630 ns 16397 Toffoli
BM_sim_Toffoli/18 46316 ns 46312 ns 15017 Toffoli
BM_sim_Toffoli/19 60038 ns 60035 ns 11109 Toffoli
BM_sim_Toffoli/20 115281 ns 115278 ns 5939 Toffoli
BM_sim_Toffoli/21 200074 ns 200057 ns 3454 Toffoli
BM_sim_Toffoli/22 363645 ns 363577 ns 1915 Toffoli
BM_sim_Toffoli/23 685941 ns 685897 ns 982 Toffoli
BM_sim_Toffoli/24 1338329 ns 1338162 ns 513 Toffoli
BM_sim_Toffoli/25 2635010 ns 2634819 ns 265 Toffoli
BM_sim_Rx/4 72102 ns 35963 ns 19794 Rx
BM_sim_Rx/5 80650 ns 40312 ns 17686 Rx
BM_sim_Rx/6 72756 ns 36064 ns 19231 Rx
BM_sim_Rx/7 73218 ns 35729 ns 19798 Rx
BM_sim_Rx/8 74390 ns 35158 ns 19986 Rx
BM_sim_Rx/9 84958 ns 42380 ns 16617 Rx
BM_sim_Rx/10 92790 ns 40361 ns 18615 Rx
BM_sim_Rx/11 27356 ns 27353 ns 25717 Rx
BM_sim_Rx/12 27499 ns 27494 ns 25093 Rx
BM_sim_Rx/13 27236 ns 27233 ns 25461 Rx
BM_sim_Rx/14 28113 ns 28110 ns 25462 Rx
BM_sim_Rx/15 27886 ns 27882 ns 23699 Rx
BM_sim_Rx/16 32127 ns 32125 ns 22571 Rx
BM_sim_Rx/17 40390 ns 40389 ns 17317 Rx
BM_sim_Rx/18 52093 ns 52092 ns 12340 Rx
BM_sim_Rx/19 76243 ns 76237 ns 8684 Rx
BM_sim_Rx/20 233759 ns 233754 ns 3021 Rx
BM_sim_Rx/21 429624 ns 429511 ns 1566 Rx
BM_sim_Rx/22 832338 ns 832121 ns 819 Rx
BM_sim_Rx/23 1627797 ns 1627682 ns 381 Rx
BM_sim_Rx/24 3208769 ns 3208125 ns 214 Rx
BM_sim_Rx/25 6565038 ns 6564464 ns 104 Rx
BM_sim_Ry/4 72129 ns 36269 ns 19373 Ry
BM_sim_Ry/5 72619 ns 36437 ns 19428 Ry
BM_sim_Ry/6 72641 ns 36040 ns 19361 Ry
BM_sim_Ry/7 79539 ns 38602 ns 19708 Ry
BM_sim_Ry/8 73979 ns 35249 ns 20036 Ry
BM_sim_Ry/9 84181 ns 42515 ns 16534 Ry
BM_sim_Ry/10 85892 ns 37822 ns 18464 Ry
BM_sim_Ry/11 28201 ns 28198 ns 24179 Ry
BM_sim_Ry/12 27637 ns 27636 ns 24850 Ry
BM_sim_Ry/13 27421 ns 27420 ns 25703 Ry
BM_sim_Ry/14 27380 ns 27379 ns 25514 Ry
BM_sim_Ry/15 28935 ns 28933 ns 25849 Ry
BM_sim_Ry/16 30196 ns 30191 ns 23309 Ry
BM_sim_Ry/17 38961 ns 38959 ns 18040 Ry
BM_sim_Ry/18 50386 ns 50379 ns 13242 Ry
BM_sim_Ry/19 74318 ns 74314 ns 9136 Ry
BM_sim_Ry/20 234805 ns 234799 ns 3021 Ry
BM_sim_Ry/21 432908 ns 432869 ns 1643 Ry
BM_sim_Ry/22 831943 ns 831909 ns 785 Ry
BM_sim_Ry/23 1688951 ns 1688923 ns 424 Ry
BM_sim_Ry/24 3271922 ns 3271853 ns 210 Ry
BM_sim_Ry/25 6484537 ns 6484107 ns 104 Ry
Last shot at this, with apologies for rapid-fire comments: with c055501, I thought that if timing is on the basis on the entire function, we can capture the effective benefit of Qrack's asynchronous execution by calling Finish()
only once at the end of the the test, but please correct me if I have misunderstood the framework.
Here's where we stand on my laptop, and please correct me if I have abused the benchmark framework. I'd rather this was apples-to-apples, wherever Qrack stands comparatively on pure "ket" simulation:
2021-12-30T23:22:25-05:00
Running ./benchmarks
Run on (16 X 5300 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x8)
L1 Instruction 32 KiB (x8)
L2 Unified 256 KiB (x8)
L3 Unified 16384 KiB (x1)
Load Average: 0.34, 0.44, 0.48
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
Device #0, Loaded binary from: /home/iamu/.qrack/qrack_ocl_dev_Intel(R)_Gen9_HD_Graphics_NEO.ir
Device #1, Loaded binary from: /home/iamu/.qrack/qrack_ocl_dev_NVIDIA_GeForce_RTX_3080_Laptop_GPU.ir
Default platform: NVIDIA CUDA
Default device: NVIDIA GeForce RTX 3080 Laptop GPU
OpenCL device #0: Intel(R) Gen9 HD Graphics NEO
OpenCL device #1: NVIDIA GeForce RTX 3080 Laptop GPU
------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------
BM_sim_X/4 715 ns 632 ns 1119798 X
BM_sim_X/5 624 ns 560 ns 1029253 X
BM_sim_X/6 451 ns 415 ns 1394891 X
BM_sim_X/7 409 ns 387 ns 1805975 X
BM_sim_X/8 380 ns 365 ns 1839478 X
BM_sim_X/9 378 ns 369 ns 1999946 X
BM_sim_X/10 356 ns 350 ns 2046663 X
BM_sim_X/11 22381 ns 22380 ns 30632 X
BM_sim_X/12 17254 ns 17252 ns 32276 X
BM_sim_X/13 22685 ns 22683 ns 35592 X
BM_sim_X/14 15013 ns 15012 ns 34353 X
BM_sim_X/15 15040 ns 15040 ns 46784 X
BM_sim_X/16 16092 ns 16092 ns 36889 X
BM_sim_X/17 26787 ns 26782 ns 25980 X
BM_sim_X/18 38092 ns 38088 ns 18504 X
BM_sim_X/19 61161 ns 61153 ns 10679 X
BM_sim_X/20 211708 ns 211690 ns 3097 X
BM_sim_X/21 404082 ns 404033 ns 1752 X
BM_sim_X/22 787886 ns 787783 ns 864 X
BM_sim_X/23 1563176 ns 1562965 ns 442 X
BM_sim_X/24 3026656 ns 3026472 ns 226 X
BM_sim_X/25 6165349 ns 6165078 ns 120 X
BM_sim_H/4 728 ns 641 ns 1350027 H
BM_sim_H/5 525 ns 474 ns 1180350 H
BM_sim_H/6 410 ns 380 ns 1751386 H
BM_sim_H/7 376 ns 353 ns 2031210 H
BM_sim_H/8 346 ns 331 ns 2110597 H
BM_sim_H/9 340 ns 330 ns 2185749 H
BM_sim_H/10 346 ns 339 ns 2127119 H
BM_sim_H/11 22398 ns 22397 ns 31979 H
BM_sim_H/12 22154 ns 22154 ns 31469 H
BM_sim_H/13 21696 ns 21694 ns 32669 H
BM_sim_H/14 22625 ns 22624 ns 34865 H
BM_sim_H/15 24781 ns 24779 ns 28563 H
BM_sim_H/16 26008 ns 26005 ns 26738 H
BM_sim_H/17 26848 ns 26848 ns 25846 H
BM_sim_H/18 38348 ns 38346 ns 14273 H
BM_sim_H/19 63220 ns 63196 ns 10030 H
BM_sim_H/20 222427 ns 222415 ns 3095 H
BM_sim_H/21 417771 ns 417766 ns 1692 H
BM_sim_H/22 811376 ns 811313 ns 824 H
BM_sim_H/23 1616207 ns 1616176 ns 425 H
BM_sim_H/24 3205480 ns 3205312 ns 218 H
BM_sim_H/25 6370058 ns 6370056 ns 108 H
BM_sim_T/4 717 ns 637 ns 1215602 T
BM_sim_T/5 587 ns 523 ns 1086015 T
BM_sim_T/6 436 ns 397 ns 1701196 T
BM_sim_T/7 380 ns 360 ns 1939474 T
BM_sim_T/8 364 ns 344 ns 2005481 T
BM_sim_T/9 351 ns 342 ns 2111346 T
BM_sim_T/10 343 ns 338 ns 2089475 T
BM_sim_T/11 22577 ns 22576 ns 30130 T
BM_sim_T/12 21916 ns 21915 ns 30650 T
BM_sim_T/13 23063 ns 23063 ns 29928 T
BM_sim_T/14 23589 ns 23587 ns 29536 T
BM_sim_T/15 25484 ns 25483 ns 27702 T
BM_sim_T/16 26458 ns 26457 ns 25937 T
BM_sim_T/17 43570 ns 43565 ns 18343 T
BM_sim_T/18 60569 ns 60562 ns 10415 T
BM_sim_T/19 96977 ns 96973 ns 6546 T
BM_sim_T/20 379615 ns 379398 ns 1850 T
BM_sim_T/21 729255 ns 729208 ns 900 T
BM_sim_T/22 1435540 ns 1434500 ns 475 T
BM_sim_T/23 2853440 ns 2853381 ns 247 T
BM_sim_T/24 5664120 ns 5663730 ns 122 T
BM_sim_T/25 11243131 ns 11243213 ns 67 T
BM_sim_CNOT/4 632 ns 560 ns 1000000 CNOT
BM_sim_CNOT/5 605 ns 540 ns 1136015 CNOT
BM_sim_CNOT/6 728 ns 652 ns 1015970 CNOT
BM_sim_CNOT/7 466 ns 423 ns 1344763 CNOT
BM_sim_CNOT/8 378 ns 358 ns 1876809 CNOT
BM_sim_CNOT/9 369 ns 353 ns 1983322 CNOT
BM_sim_CNOT/10 403 ns 388 ns 2075209 CNOT
BM_sim_CNOT/11 23250 ns 23249 ns 31227 CNOT
BM_sim_CNOT/12 23835 ns 23834 ns 28826 CNOT
BM_sim_CNOT/13 23409 ns 23406 ns 29293 CNOT
BM_sim_CNOT/14 23761 ns 23757 ns 29621 CNOT
BM_sim_CNOT/15 23495 ns 23493 ns 29198 CNOT
BM_sim_CNOT/16 26407 ns 26405 ns 26702 CNOT
BM_sim_CNOT/17 26769 ns 26768 ns 25991 CNOT
BM_sim_CNOT/18 27915 ns 27914 ns 25448 CNOT
BM_sim_CNOT/19 49777 ns 49770 ns 12705 CNOT
BM_sim_CNOT/20 129554 ns 129543 ns 5199 CNOT
BM_sim_CNOT/21 256629 ns 256594 ns 2866 CNOT
BM_sim_CNOT/22 461934 ns 461758 ns 1329 CNOT
BM_sim_CNOT/23 907218 ns 904537 ns 748 CNOT
BM_sim_CNOT/24 1794867 ns 1789897 ns 391 CNOT
BM_sim_CNOT/25 3562287 ns 3551268 ns 198 CNOT
BM_sim_Toffoli/4 507 ns 454 ns 1471194 Toffoli
BM_sim_Toffoli/5 685 ns 632 ns 1301404 Toffoli
BM_sim_Toffoli/6 768 ns 716 ns 989815 Toffoli
BM_sim_Toffoli/7 767 ns 679 ns 1106098 Toffoli
BM_sim_Toffoli/8 475 ns 427 ns 1334113 Toffoli
BM_sim_Toffoli/9 391 ns 368 ns 1902690 Toffoli
BM_sim_Toffoli/10 385 ns 370 ns 1962712 Toffoli
BM_sim_Toffoli/11 25977 ns 25976 ns 27098 Toffoli
BM_sim_Toffoli/12 25873 ns 25872 ns 26669 Toffoli
BM_sim_Toffoli/13 25788 ns 25787 ns 27132 Toffoli
BM_sim_Toffoli/14 25891 ns 25889 ns 27039 Toffoli
BM_sim_Toffoli/15 26072 ns 26069 ns 26593 Toffoli
BM_sim_Toffoli/16 25902 ns 25900 ns 27050 Toffoli
BM_sim_Toffoli/17 26120 ns 26105 ns 26793 Toffoli
BM_sim_Toffoli/18 31619 ns 31614 ns 25620 Toffoli
BM_sim_Toffoli/19 48169 ns 48163 ns 12789 Toffoli
BM_sim_Toffoli/20 105447 ns 105442 ns 6300 Toffoli
BM_sim_Toffoli/21 185478 ns 185468 ns 3649 Toffoli
BM_sim_Toffoli/22 346589 ns 346534 ns 2008 Toffoli
BM_sim_Toffoli/23 675066 ns 674914 ns 968 Toffoli
BM_sim_Toffoli/24 1331968 ns 1330898 ns 515 Toffoli
BM_sim_Toffoli/25 2629952 ns 2629680 ns 265 Toffoli
BM_sim_Rx/4 719 ns 663 ns 1106929 Rx
BM_sim_Rx/5 532 ns 475 ns 1280681 Rx
BM_sim_Rx/6 434 ns 398 ns 1829955 Rx
BM_sim_Rx/7 384 ns 358 ns 1836666 Rx
BM_sim_Rx/8 378 ns 356 ns 2062166 Rx
BM_sim_Rx/9 356 ns 343 ns 2064817 Rx
BM_sim_Rx/10 341 ns 336 ns 2086995 Rx
BM_sim_Rx/11 23705 ns 23703 ns 21262 Rx
BM_sim_Rx/12 22926 ns 22909 ns 30208 Rx
BM_sim_Rx/13 22412 ns 22409 ns 29923 Rx
BM_sim_Rx/14 23252 ns 23237 ns 29999 Rx
BM_sim_Rx/15 25394 ns 25392 ns 27623 Rx
BM_sim_Rx/16 25146 ns 25130 ns 27066 Rx
BM_sim_Rx/17 26723 ns 26721 ns 26276 Rx
BM_sim_Rx/18 38209 ns 38198 ns 18392 Rx
BM_sim_Rx/19 62591 ns 62591 ns 9756 Rx
BM_sim_Rx/20 222190 ns 222185 ns 3165 Rx
BM_sim_Rx/21 417478 ns 417456 ns 1677 Rx
BM_sim_Rx/22 821739 ns 821690 ns 811 Rx
BM_sim_Rx/23 1619233 ns 1619143 ns 424 Rx
BM_sim_Rx/24 3230909 ns 3230844 ns 218 Rx
BM_sim_Rx/25 6374552 ns 6372848 ns 109 Rx
BM_sim_Ry/4 750 ns 665 ns 1226157 Ry
BM_sim_Ry/5 579 ns 519 ns 1119128 Ry
BM_sim_Ry/6 422 ns 387 ns 1762334 Ry
BM_sim_Ry/7 381 ns 357 ns 1904044 Ry
BM_sim_Ry/8 364 ns 348 ns 2055238 Ry
BM_sim_Ry/9 354 ns 345 ns 2019379 Ry
BM_sim_Ry/10 355 ns 351 ns 2018484 Ry
BM_sim_Ry/11 23131 ns 23130 ns 31061 Ry
BM_sim_Ry/12 23137 ns 23137 ns 30620 Ry
BM_sim_Ry/13 23140 ns 23136 ns 30239 Ry
BM_sim_Ry/14 23331 ns 23330 ns 30325 Ry
BM_sim_Ry/15 25961 ns 25959 ns 26672 Ry
BM_sim_Ry/16 26525 ns 26523 ns 26316 Ry
BM_sim_Ry/17 26798 ns 26793 ns 26204 Ry
BM_sim_Ry/18 38467 ns 38467 ns 18081 Ry
BM_sim_Ry/19 69862 ns 69862 ns 11228 Ry
BM_sim_Ry/20 221985 ns 221962 ns 3153 Ry
BM_sim_Ry/21 421635 ns 421581 ns 1661 Ry
BM_sim_Ry/22 822811 ns 822737 ns 857 Ry
BM_sim_Ry/23 1617523 ns 1617524 ns 436 Ry
BM_sim_Ry/24 3215180 ns 3214591 ns 220 Ry
BM_sim_Ry/25 6354351 ns 6353519 ns 119 Ry
The discontinuity at around 11 qubits likely due to "hybridization" between CPU and GPU, by the way, as this is the threshold to switch from one device to the other. Maybe the threshold could be tuned better, but, by wall clock time in the Qrack benchmark suite, CPU appears to start doubling at above about this threshold, whereas GPU might still drag slightly at the same qubit width due to failure to occupy all processing elements, basically.
Very impressive work! This result seems to make sense in general I'm wondering if you could also implement the variational circuit benchmark for the sake of completeness?
And as I mentioned in the issue if you want to show off the algorithms it's ok to include them but just remember to have a note page explaining what the algorithms is and what's the advantages and limitations. Then when I run the results I'll put a footnote similar to ddqsim.
Thanks for the quick turnaround, by the way! As it's New Year's Eve, I might not have time today, but I can definitely figure out how to implement the variational circuit as well, likely over the weekend. Since you say it's alright, I'll also add benchmarks and a notes document for our "default optimal layer stack," which is mostly a combination of Schmidt decomposition on kets and extended stabilizer subsystems that can also transparently fall back to ket, in addition to this underlying CPU/GPU "hybridized" ket simulation. Just to brag even more, we even switch over to an Intel-QS-like "paged" simulation once maximum single GPU allocation is exceeded, which gives about 2 additional qubits of width, since NVIDIA GPUs are almost universally "chopped" into 4 equal maximum allocation segments that OpenCL has to manually manage, while I think CUDA can allocate over the full VRAM transparently instead. All of the above works for multi-GPU or distributed simulation, as well.
I hope the community benefits from the Qrack contributors' work, and thank you for including us! Happy New Year!
I just added 1e4ef43 with the variational benchmark.
I'm very sorry for the delay, by the way. I was ill for a good chunk of January, unfortunately, but I feel much better, now!
If I were to add Qrack's default optimized "layer stack" benchmarks, (of which X
, for example, and many of the other benchmarks are trivial,) I could please use your advice and where the best place is to put them, though.
40b73c0 adds a const bool
that can be toggled in code to switch between default optimal stack and "ket" only. It's set for "ket" by default, as is appropriate.
Lastly, for 7bce003, it looks like ComputeStatistics()
for "min" estimator is used for QuEST, so I take it that it should be used here, too. (It might have been turned off while the default optimal layer stack was in use.)
thanks for this awesome work, do you want to do the default benchmark in this PR or a separate one? this PR currently looks good to me, I can merge it first if you want.
If I were to add Qrack's default optimized "layer stack" benchmarks, (of which X, for example, and many of the other benchmarks are trivial,) I could please use your advice and where the best place is to put them, though.
just use a separate folder for different build options etc.?
@Roger-luo Thanks for circling back quickly! Rather than duplicating the code, the bool
with comment at the top of the benchmark file serves much the same purpose, without completely duplicating the code, up to that one setting. It generally bothers me when code needs to be changed by the user to invoke intended functionality, but I think your users can figure this one out, if they have reason to want that bool
.
LGTM, too. Merge away!
By the way, the build assumes that your GPU in your benchmark instance has libOpenCL
and an OpenCL ICD, for the same device you use for CUDA benchmarks. You probably already have both in that environment, but we'll see.
(I could make the setup.sh
script set those up, if they aren't present, but they often are already when the environment is already configured for CUDA development, with the toolkit.)
cool, thanks!
it would take some time to update the benchmark results after my refactor of the benchmark suite tho, I'll try to take care of the deps in the refactor, we will see
Thank you! If libOpenCL.so
and the ICD are present, as are usually installed with the CUDA toolkit anyway, the only other build dependency I can think of besides the cmake
and g++
that are already in there is sudo apt install opencl-headers
for the OpenCL C++ headers. Come find me if you can't figure it out, though.
Building upon the work of https://github.com/yardstiq/quantum-benchmarks/pull/24, (with thanks to @codewithsk,) if the comparison is meant to be strictly ket, this limits Qrack optimization "layers" to just that, with OpenCL GPU acceleration, hybridized with CPU based ket simulation.