yardstiq / quantum-benchmarks

benchmarking quantum circuit emulators for your daily research usage
Other
119 stars 28 forks source link

Qrack (strictly "ket") #50

Closed WrathfulSpatula closed 2 years ago

WrathfulSpatula commented 2 years ago

Building upon the work of https://github.com/yardstiq/quantum-benchmarks/pull/24, (with thanks to @codewithsk,) if the comparison is meant to be strictly ket, this limits Qrack optimization "layers" to just that, with OpenCL GPU acceleration, hybridized with CPU based ket simulation.

WrathfulSpatula commented 2 years ago

I'm on an Alienware m17 laptop running Ubuntu, and I can get you any other information you need about my hardware, but it looks like some of this is encapsulated in the benchmark run header. (The OpenCL information is a banner that Qrack itself prints.) The X gate benchmark is handled differently from the other gates in the original PR by @codewithsk, and I'm not sure if that's by design intention, but these are my results, exactly as the state of the PR is as of now:

2021-12-30T22:11:29-05:00
Running ./benchmarks
Run on (16 X 5300 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.64, 0.95, 1.00
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
Device #0, Loaded binary from: /home/iamu/.qrack/qrack_ocl_dev_Intel(R)_Gen9_HD_Graphics_NEO.ir
Device #1, Loaded binary from: /home/iamu/.qrack/qrack_ocl_dev_NVIDIA_GeForce_RTX_3080_Laptop_GPU.ir
Default platform: NVIDIA CUDA
Default device: NVIDIA GeForce RTX 3080 Laptop GPU
OpenCL device #0: Intel(R) Gen9 HD Graphics NEO
OpenCL device #1: NVIDIA GeForce RTX 3080 Laptop GPU
------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
BM_sim_X/4            213259 ns       112948 ns         6278 X
BM_sim_X/5            213206 ns       112665 ns         6195 X
BM_sim_X/6            216765 ns       115058 ns         6133 X
BM_sim_X/7            221760 ns       116908 ns         6080 X
BM_sim_X/8            221702 ns       117289 ns         5847 X
BM_sim_X/9            241606 ns       128802 ns         5908 X
BM_sim_X/10           224935 ns       112945 ns         6217 X
BM_sim_X/11           208898 ns       206672 ns         3402 X
BM_sim_X/12           208507 ns       206667 ns         3337 X
BM_sim_X/13           206484 ns       204964 ns         3394 X
BM_sim_X/14           208688 ns       206654 ns         3281 X
BM_sim_X/15           208174 ns       206324 ns         3348 X
BM_sim_X/16           210380 ns       208174 ns         3355 X
BM_sim_X/17           218046 ns       216240 ns         3168 X
BM_sim_X/18           405332 ns       400874 ns         1748 X
BM_sim_X/19           564538 ns       533984 ns         1354 X
BM_sim_X/20           737173 ns       730341 ns          912 X
BM_sim_X/21          1188068 ns      1179324 ns          570 X
BM_sim_X/22          2057399 ns      2043418 ns          337 X
BM_sim_X/23          3757884 ns      3742428 ns          188 X
BM_sim_X/24          6873899 ns      6851575 ns          101 X
BM_sim_X/25         11484559 ns     11459152 ns           60 X
BM_sim_H/4               548 ns          491 ns      1084844 H
BM_sim_H/5               509 ns          449 ns      1379725 H
BM_sim_H/6               403 ns          362 ns      2034398 H
BM_sim_H/7               354 ns          327 ns      2248607 H
BM_sim_H/8               322 ns          306 ns      2340997 H
BM_sim_H/9               296 ns          286 ns      2415613 H
BM_sim_H/10              277 ns          273 ns      2555268 H
BM_sim_H/11            24422 ns        24419 ns        28658 H
BM_sim_H/12            24117 ns        24115 ns        28891 H
BM_sim_H/13            24613 ns        24610 ns        28469 H
BM_sim_H/14            24696 ns        24695 ns        27988 H
BM_sim_H/15            25724 ns        25722 ns        28385 H
BM_sim_H/16            25621 ns        25619 ns        27282 H
BM_sim_H/17            26688 ns        26685 ns        26094 H
BM_sim_H/18            38309 ns        38305 ns        14620 H
BM_sim_H/19            61861 ns        61859 ns        10369 H
BM_sim_H/20           211105 ns       211092 ns         3254 H
BM_sim_H/21           414356 ns       414331 ns         1696 H
BM_sim_H/22           800980 ns       800958 ns          820 H
BM_sim_H/23          1603827 ns      1603795 ns          432 H
BM_sim_H/24          3181423 ns      3181202 ns          220 H
BM_sim_H/25          6316012 ns      6315664 ns          115 H
BM_sim_T/4               596 ns          531 ns      1063853 T
BM_sim_T/5               586 ns          522 ns      1144787 T
BM_sim_T/6               429 ns          392 ns      1826865 T
BM_sim_T/7               374 ns          350 ns      1969759 T
BM_sim_T/8               354 ns          334 ns      2042546 T
BM_sim_T/9               322 ns          311 ns      2251085 T
BM_sim_T/10              305 ns          298 ns      2341433 T
BM_sim_T/11            23566 ns        23563 ns        29213 T
BM_sim_T/12            24272 ns        24269 ns        29684 T
BM_sim_T/13            23836 ns        23834 ns        29901 T
BM_sim_T/14            23814 ns        23812 ns        29398 T
BM_sim_T/15            25176 ns        25173 ns        27890 T
BM_sim_T/16            30957 ns        30945 ns        25974 T
BM_sim_T/17            38142 ns        38134 ns        18290 T
BM_sim_T/18            52379 ns        52373 ns        11892 T
BM_sim_T/19            95812 ns        95807 ns         6910 T
BM_sim_T/20           369271 ns       369240 ns         1909 T
BM_sim_T/21           720412 ns       720366 ns          930 T
BM_sim_T/22          1425567 ns      1425561 ns          484 T
BM_sim_T/23          2823849 ns      2823788 ns          247 T
BM_sim_T/24          5614832 ns      5614501 ns          127 T
BM_sim_T/25         11145959 ns     11145146 ns           66 T
BM_sim_CNOT/4            677 ns          605 ns      1000000 CNOT
BM_sim_CNOT/5            777 ns          714 ns       989520 CNOT
BM_sim_CNOT/6            575 ns          513 ns      1090034 CNOT
BM_sim_CNOT/7            422 ns          379 ns      1834157 CNOT
BM_sim_CNOT/8            378 ns          349 ns      1955417 CNOT
BM_sim_CNOT/9            341 ns          323 ns      2214292 CNOT
BM_sim_CNOT/10           308 ns          298 ns      2374219 CNOT
BM_sim_CNOT/11         25155 ns        25152 ns        27502 CNOT
BM_sim_CNOT/12         24841 ns        24838 ns        28230 CNOT
BM_sim_CNOT/13         25220 ns        25217 ns        28538 CNOT
BM_sim_CNOT/14         24525 ns        24521 ns        27621 CNOT
BM_sim_CNOT/15         24321 ns        24318 ns        27659 CNOT
BM_sim_CNOT/16         28957 ns        28954 ns        26956 CNOT
BM_sim_CNOT/17         26045 ns        26040 ns        26938 CNOT
BM_sim_CNOT/18         27007 ns        27003 ns        25367 CNOT
BM_sim_CNOT/19         49399 ns        49394 ns        12751 CNOT
BM_sim_CNOT/20        129419 ns       129411 ns         5307 CNOT
BM_sim_CNOT/21        236652 ns       236634 ns         2917 CNOT
BM_sim_CNOT/22        459842 ns       459781 ns         1531 CNOT
BM_sim_CNOT/23        901098 ns       901009 ns          753 CNOT
BM_sim_CNOT/24       1777652 ns      1777616 ns          395 CNOT
BM_sim_CNOT/25       3532641 ns      3532509 ns          199 CNOT
BM_sim_Toffoli/4         646 ns          596 ns      1420794 Toffoli
BM_sim_Toffoli/5         696 ns          640 ns      1119252 Toffoli
BM_sim_Toffoli/6         740 ns          680 ns      1280421 Toffoli
BM_sim_Toffoli/7         684 ns          599 ns      1137688 Toffoli
BM_sim_Toffoli/8         419 ns          379 ns      1783350 Toffoli
BM_sim_Toffoli/9         372 ns          346 ns      2058936 Toffoli
BM_sim_Toffoli/10        328 ns          313 ns      2225554 Toffoli
BM_sim_Toffoli/11      25722 ns        25706 ns        27219 Toffoli
BM_sim_Toffoli/12      25827 ns        25826 ns        27243 Toffoli
BM_sim_Toffoli/13      25751 ns        25747 ns        26671 Toffoli
BM_sim_Toffoli/14      25764 ns        25764 ns        27145 Toffoli
BM_sim_Toffoli/15      29746 ns        29744 ns        26828 Toffoli
BM_sim_Toffoli/16      25661 ns        25660 ns        26922 Toffoli
BM_sim_Toffoli/17      25719 ns        25717 ns        27324 Toffoli
BM_sim_Toffoli/18      26146 ns        26144 ns        26152 Toffoli
BM_sim_Toffoli/19      46794 ns        46789 ns        13846 Toffoli
BM_sim_Toffoli/20     102813 ns       102811 ns         6398 Toffoli
BM_sim_Toffoli/21     184932 ns       184913 ns         3685 Toffoli
BM_sim_Toffoli/22     344639 ns       344618 ns         2036 Toffoli
BM_sim_Toffoli/23     670496 ns       670439 ns          992 Toffoli
BM_sim_Toffoli/24    1329404 ns      1329354 ns          524 Toffoli
BM_sim_Toffoli/25    2601449 ns      2600936 ns          269 Toffoli
BM_sim_Rx/4              625 ns          554 ns      1183200 Rx
BM_sim_Rx/5              547 ns          488 ns      1201053 Rx
BM_sim_Rx/6              412 ns          369 ns      1822742 Rx
BM_sim_Rx/7              360 ns          332 ns      2108205 Rx
BM_sim_Rx/8              318 ns          305 ns      2273275 Rx
BM_sim_Rx/9              299 ns          291 ns      2393503 Rx
BM_sim_Rx/10             284 ns          280 ns      2430357 Rx
BM_sim_Rx/11           25235 ns        25232 ns        28303 Rx
BM_sim_Rx/12           24925 ns        24924 ns        43500 Rx
BM_sim_Rx/13           27080 ns        27069 ns        27805 Rx
BM_sim_Rx/14           25209 ns        25206 ns        28640 Rx
BM_sim_Rx/15           26155 ns        26153 ns        26852 Rx
BM_sim_Rx/16           25757 ns        25753 ns        26530 Rx
BM_sim_Rx/17           26784 ns        26781 ns        26146 Rx
BM_sim_Rx/18           38234 ns        38233 ns        18294 Rx
BM_sim_Rx/19           61962 ns        61954 ns        10220 Rx
BM_sim_Rx/20          222351 ns       222331 ns         3148 Rx
BM_sim_Rx/21          415150 ns       415083 ns         1686 Rx
BM_sim_Rx/22          820492 ns       820483 ns          817 Rx
BM_sim_Rx/23         1611268 ns      1611238 ns          426 Rx
BM_sim_Rx/24         3210008 ns      3209779 ns          221 Rx
BM_sim_Rx/25         6339580 ns      6338987 ns          113 Rx
BM_sim_Ry/4              675 ns          598 ns      1234713 Ry
BM_sim_Ry/5              740 ns          646 ns      1059282 Ry
BM_sim_Ry/6              447 ns          402 ns      1741965 Ry
BM_sim_Ry/7              388 ns          356 ns      1937756 Ry
BM_sim_Ry/8              338 ns          319 ns      2212755 Ry
BM_sim_Ry/9              314 ns          303 ns      2337792 Ry
BM_sim_Ry/10             296 ns          291 ns      2437109 Ry
BM_sim_Ry/11           27298 ns        27294 ns        28810 Ry
BM_sim_Ry/12           23882 ns        23879 ns        28938 Ry
BM_sim_Ry/13           24517 ns        24514 ns        29010 Ry
BM_sim_Ry/14           24373 ns        24370 ns        28657 Ry
BM_sim_Ry/15           24638 ns        24635 ns        28010 Ry
BM_sim_Ry/16           25001 ns        24998 ns        27480 Ry
BM_sim_Ry/17           27003 ns        27002 ns        26122 Ry
BM_sim_Ry/18           38464 ns        38459 ns        18384 Ry
BM_sim_Ry/19           62148 ns        62140 ns        10262 Ry
BM_sim_Ry/20          222616 ns       222586 ns         3145 Ry
BM_sim_Ry/21          415209 ns       415147 ns         1685 Ry
BM_sim_Ry/22          822340 ns       822322 ns          821 Ry
BM_sim_Ry/23         1616264 ns      1614425 ns          430 Ry
BM_sim_Ry/24         3212534 ns      3212245 ns          219 Ry
BM_sim_Ry/25         6340322 ns      6339099 ns          110 Ry
WrathfulSpatula commented 2 years ago

Something tells me these should all initialize the simulator outside of the loop and Finish() before returning, for the desired comparison. (Qrack is mostly asynchronous, depending on the conditions, and Finish() blocks for completion.)

Sorry to spam, but these results look more like expected, after c5c013a:

2021-12-30T22:34:50-05:00
Running ./benchmarks
Run on (16 X 5300 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.95, 0.74, 0.75
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
Device #0, Loaded binary from: /home/iamu/.qrack/qrack_ocl_dev_Intel(R)_Gen9_HD_Graphics_NEO.ir
Device #1, Loaded binary from: /home/iamu/.qrack/qrack_ocl_dev_NVIDIA_GeForce_RTX_3080_Laptop_GPU.ir
Default platform: NVIDIA CUDA
Default device: NVIDIA GeForce RTX 3080 Laptop GPU
OpenCL device #0: Intel(R) Gen9 HD Graphics NEO
OpenCL device #1: NVIDIA GeForce RTX 3080 Laptop GPU
------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
BM_sim_X/4             73620 ns        36897 ns        19254 X
BM_sim_X/5             73509 ns        36964 ns        19097 X
BM_sim_X/6             73321 ns        36781 ns        18597 X
BM_sim_X/7             74625 ns        36009 ns        19213 X
BM_sim_X/8             80564 ns        40296 ns        16508 X
BM_sim_X/9             90610 ns        43907 ns        17100 X
BM_sim_X/10            87139 ns        38694 ns        18058 X
BM_sim_X/11            25241 ns        25239 ns        27651 X
BM_sim_X/12            25197 ns        25196 ns        27296 X
BM_sim_X/13            25019 ns        25016 ns        27704 X
BM_sim_X/14            25041 ns        25040 ns        27705 X
BM_sim_X/15            25400 ns        25397 ns        27269 X
BM_sim_X/16            26712 ns        26711 ns        26871 X
BM_sim_X/17            36641 ns        36640 ns        19033 X
BM_sim_X/18            57733 ns        57730 ns        13022 X
BM_sim_X/19            71597 ns        71586 ns         8977 X
BM_sim_X/20           220346 ns       220346 ns         3151 X
BM_sim_X/21           412093 ns       412045 ns         1705 X
BM_sim_X/22           790942 ns       790908 ns          825 X
BM_sim_X/23          1574414 ns      1574120 ns          437 X
BM_sim_X/24          3049004 ns      3048734 ns          224 X
BM_sim_X/25          6197424 ns      6196762 ns          105 X
BM_sim_H/4             72486 ns        36369 ns        19443 H
BM_sim_H/5             72338 ns        36019 ns        18792 H
BM_sim_H/6             79231 ns        38708 ns        19516 H
BM_sim_H/7             72443 ns        35189 ns        19887 H
BM_sim_H/8             74737 ns        35122 ns        20150 H
BM_sim_H/9             84327 ns        41577 ns        17572 H
BM_sim_H/10            86417 ns        38103 ns        18561 H
BM_sim_H/11            28224 ns        27913 ns        24093 H
BM_sim_H/12            30942 ns        30940 ns        17830 H
BM_sim_H/13            27166 ns        27165 ns        21197 H
BM_sim_H/14            34032 ns        34031 ns        26009 H
BM_sim_H/15            31409 ns        31408 ns        25939 H
BM_sim_H/16            36161 ns        36158 ns        18071 H
BM_sim_H/17            40351 ns        40350 ns        16937 H
BM_sim_H/18            51431 ns        51427 ns        13647 H
BM_sim_H/19            74466 ns        74466 ns         8001 H
BM_sim_H/20           243701 ns       243683 ns         2863 H
BM_sim_H/21           489477 ns       489474 ns         1552 H
BM_sim_H/22           881362 ns       881259 ns          794 H
BM_sim_H/23          1907669 ns      1907434 ns          381 H
BM_sim_H/24          3563037 ns      3560809 ns          192 H
BM_sim_H/25          7144112 ns      7143757 ns           94 H
BM_sim_T/4             72935 ns        36658 ns        19374 T
BM_sim_T/5             72518 ns        36258 ns        19062 T
BM_sim_T/6             72168 ns        35931 ns        19356 T
BM_sim_T/7             72583 ns        35302 ns        19824 T
BM_sim_T/8             78908 ns        37761 ns        20267 T
BM_sim_T/9             84283 ns        43088 ns        16116 T
BM_sim_T/10            84578 ns        38228 ns        18319 T
BM_sim_T/11            27394 ns        27392 ns        25584 T
BM_sim_T/12            27978 ns        27796 ns        25481 T
BM_sim_T/13            27966 ns        27948 ns        25204 T
BM_sim_T/14            27734 ns        27733 ns        25132 T
BM_sim_T/15            27846 ns        27830 ns        24761 T
BM_sim_T/16            38472 ns        38468 ns        18023 T
BM_sim_T/17            49993 ns        49957 ns        12264 T
BM_sim_T/18            68159 ns        68155 ns        10646 T
BM_sim_T/19           107429 ns       107357 ns         6183 T
BM_sim_T/20           381872 ns       381844 ns         1831 T
BM_sim_T/21           733143 ns       733052 ns          890 T
BM_sim_T/22          1450108 ns      1449967 ns          464 T
BM_sim_T/23          2857520 ns      2857273 ns          242 T
BM_sim_T/24          5679711 ns      5679333 ns          110 T
BM_sim_T/25         11353450 ns     11352924 ns           59 T
BM_sim_CNOT/4          71175 ns        35409 ns        19865 CNOT
BM_sim_CNOT/5          78812 ns        39786 ns        19291 CNOT
BM_sim_CNOT/6          71976 ns        36004 ns        19335 CNOT
BM_sim_CNOT/7          71894 ns        35786 ns        19648 CNOT
BM_sim_CNOT/8          72030 ns        35230 ns        19891 CNOT
BM_sim_CNOT/9          82921 ns        44106 ns        16039 CNOT
BM_sim_CNOT/10         84829 ns        43329 ns        16130 CNOT
BM_sim_CNOT/11         27472 ns        27471 ns        25737 CNOT
BM_sim_CNOT/12         27477 ns        27475 ns        18510 CNOT
BM_sim_CNOT/13         27448 ns        27445 ns        25470 CNOT
BM_sim_CNOT/14         27802 ns        27800 ns        25189 CNOT
BM_sim_CNOT/15         27631 ns        27630 ns        24541 CNOT
BM_sim_CNOT/16         27869 ns        27866 ns        25046 CNOT
BM_sim_CNOT/17         38741 ns        38731 ns        18111 CNOT
BM_sim_CNOT/18         38856 ns        38845 ns        17894 CNOT
BM_sim_CNOT/19         61823 ns        61797 ns        10270 CNOT
BM_sim_CNOT/20        142511 ns       142481 ns         4697 CNOT
BM_sim_CNOT/21        251819 ns       251788 ns         2779 CNOT
BM_sim_CNOT/22        470824 ns       470776 ns         1318 CNOT
BM_sim_CNOT/23        917129 ns       917085 ns          584 CNOT
BM_sim_CNOT/24       1799441 ns      1799201 ns          385 CNOT
BM_sim_CNOT/25       3568946 ns      3568409 ns          196 CNOT
BM_sim_Toffoli/4       70170 ns        34896 ns        19917 Toffoli
BM_sim_Toffoli/5       70068 ns        34889 ns        19548 Toffoli
BM_sim_Toffoli/6       70963 ns        35451 ns        19495 Toffoli
BM_sim_Toffoli/7       71479 ns        35846 ns        19370 Toffoli
BM_sim_Toffoli/8       78439 ns        38282 ns        19669 Toffoli
BM_sim_Toffoli/9       72227 ns        35228 ns        19933 Toffoli
BM_sim_Toffoli/10      73801 ns        35011 ns        20157 Toffoli
BM_sim_Toffoli/11      39220 ns        39217 ns        13228 Toffoli
BM_sim_Toffoli/12      38680 ns        38678 ns        19120 Toffoli
BM_sim_Toffoli/13      38529 ns        38529 ns        18850 Toffoli
BM_sim_Toffoli/14      38874 ns        38873 ns        17835 Toffoli
BM_sim_Toffoli/15      40048 ns        40046 ns        17933 Toffoli
BM_sim_Toffoli/16      50383 ns        50382 ns        10000 Toffoli
BM_sim_Toffoli/17      41631 ns        41630 ns        16397 Toffoli
BM_sim_Toffoli/18      46316 ns        46312 ns        15017 Toffoli
BM_sim_Toffoli/19      60038 ns        60035 ns        11109 Toffoli
BM_sim_Toffoli/20     115281 ns       115278 ns         5939 Toffoli
BM_sim_Toffoli/21     200074 ns       200057 ns         3454 Toffoli
BM_sim_Toffoli/22     363645 ns       363577 ns         1915 Toffoli
BM_sim_Toffoli/23     685941 ns       685897 ns          982 Toffoli
BM_sim_Toffoli/24    1338329 ns      1338162 ns          513 Toffoli
BM_sim_Toffoli/25    2635010 ns      2634819 ns          265 Toffoli
BM_sim_Rx/4            72102 ns        35963 ns        19794 Rx
BM_sim_Rx/5            80650 ns        40312 ns        17686 Rx
BM_sim_Rx/6            72756 ns        36064 ns        19231 Rx
BM_sim_Rx/7            73218 ns        35729 ns        19798 Rx
BM_sim_Rx/8            74390 ns        35158 ns        19986 Rx
BM_sim_Rx/9            84958 ns        42380 ns        16617 Rx
BM_sim_Rx/10           92790 ns        40361 ns        18615 Rx
BM_sim_Rx/11           27356 ns        27353 ns        25717 Rx
BM_sim_Rx/12           27499 ns        27494 ns        25093 Rx
BM_sim_Rx/13           27236 ns        27233 ns        25461 Rx
BM_sim_Rx/14           28113 ns        28110 ns        25462 Rx
BM_sim_Rx/15           27886 ns        27882 ns        23699 Rx
BM_sim_Rx/16           32127 ns        32125 ns        22571 Rx
BM_sim_Rx/17           40390 ns        40389 ns        17317 Rx
BM_sim_Rx/18           52093 ns        52092 ns        12340 Rx
BM_sim_Rx/19           76243 ns        76237 ns         8684 Rx
BM_sim_Rx/20          233759 ns       233754 ns         3021 Rx
BM_sim_Rx/21          429624 ns       429511 ns         1566 Rx
BM_sim_Rx/22          832338 ns       832121 ns          819 Rx
BM_sim_Rx/23         1627797 ns      1627682 ns          381 Rx
BM_sim_Rx/24         3208769 ns      3208125 ns          214 Rx
BM_sim_Rx/25         6565038 ns      6564464 ns          104 Rx
BM_sim_Ry/4            72129 ns        36269 ns        19373 Ry
BM_sim_Ry/5            72619 ns        36437 ns        19428 Ry
BM_sim_Ry/6            72641 ns        36040 ns        19361 Ry
BM_sim_Ry/7            79539 ns        38602 ns        19708 Ry
BM_sim_Ry/8            73979 ns        35249 ns        20036 Ry
BM_sim_Ry/9            84181 ns        42515 ns        16534 Ry
BM_sim_Ry/10           85892 ns        37822 ns        18464 Ry
BM_sim_Ry/11           28201 ns        28198 ns        24179 Ry
BM_sim_Ry/12           27637 ns        27636 ns        24850 Ry
BM_sim_Ry/13           27421 ns        27420 ns        25703 Ry
BM_sim_Ry/14           27380 ns        27379 ns        25514 Ry
BM_sim_Ry/15           28935 ns        28933 ns        25849 Ry
BM_sim_Ry/16           30196 ns        30191 ns        23309 Ry
BM_sim_Ry/17           38961 ns        38959 ns        18040 Ry
BM_sim_Ry/18           50386 ns        50379 ns        13242 Ry
BM_sim_Ry/19           74318 ns        74314 ns         9136 Ry
BM_sim_Ry/20          234805 ns       234799 ns         3021 Ry
BM_sim_Ry/21          432908 ns       432869 ns         1643 Ry
BM_sim_Ry/22          831943 ns       831909 ns          785 Ry
BM_sim_Ry/23         1688951 ns      1688923 ns          424 Ry
BM_sim_Ry/24         3271922 ns      3271853 ns          210 Ry
BM_sim_Ry/25         6484537 ns      6484107 ns          104 Ry
WrathfulSpatula commented 2 years ago

Last shot at this, with apologies for rapid-fire comments: with c055501, I thought that if timing is on the basis on the entire function, we can capture the effective benefit of Qrack's asynchronous execution by calling Finish() only once at the end of the the test, but please correct me if I have misunderstood the framework.

Here's where we stand on my laptop, and please correct me if I have abused the benchmark framework. I'd rather this was apples-to-apples, wherever Qrack stands comparatively on pure "ket" simulation:

2021-12-30T23:22:25-05:00
Running ./benchmarks
Run on (16 X 5300 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.34, 0.44, 0.48
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
Device #0, Loaded binary from: /home/iamu/.qrack/qrack_ocl_dev_Intel(R)_Gen9_HD_Graphics_NEO.ir
Device #1, Loaded binary from: /home/iamu/.qrack/qrack_ocl_dev_NVIDIA_GeForce_RTX_3080_Laptop_GPU.ir
Default platform: NVIDIA CUDA
Default device: NVIDIA GeForce RTX 3080 Laptop GPU
OpenCL device #0: Intel(R) Gen9 HD Graphics NEO
OpenCL device #1: NVIDIA GeForce RTX 3080 Laptop GPU
------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
BM_sim_X/4               715 ns          632 ns      1119798 X
BM_sim_X/5               624 ns          560 ns      1029253 X
BM_sim_X/6               451 ns          415 ns      1394891 X
BM_sim_X/7               409 ns          387 ns      1805975 X
BM_sim_X/8               380 ns          365 ns      1839478 X
BM_sim_X/9               378 ns          369 ns      1999946 X
BM_sim_X/10              356 ns          350 ns      2046663 X
BM_sim_X/11            22381 ns        22380 ns        30632 X
BM_sim_X/12            17254 ns        17252 ns        32276 X
BM_sim_X/13            22685 ns        22683 ns        35592 X
BM_sim_X/14            15013 ns        15012 ns        34353 X
BM_sim_X/15            15040 ns        15040 ns        46784 X
BM_sim_X/16            16092 ns        16092 ns        36889 X
BM_sim_X/17            26787 ns        26782 ns        25980 X
BM_sim_X/18            38092 ns        38088 ns        18504 X
BM_sim_X/19            61161 ns        61153 ns        10679 X
BM_sim_X/20           211708 ns       211690 ns         3097 X
BM_sim_X/21           404082 ns       404033 ns         1752 X
BM_sim_X/22           787886 ns       787783 ns          864 X
BM_sim_X/23          1563176 ns      1562965 ns          442 X
BM_sim_X/24          3026656 ns      3026472 ns          226 X
BM_sim_X/25          6165349 ns      6165078 ns          120 X
BM_sim_H/4               728 ns          641 ns      1350027 H
BM_sim_H/5               525 ns          474 ns      1180350 H
BM_sim_H/6               410 ns          380 ns      1751386 H
BM_sim_H/7               376 ns          353 ns      2031210 H
BM_sim_H/8               346 ns          331 ns      2110597 H
BM_sim_H/9               340 ns          330 ns      2185749 H
BM_sim_H/10              346 ns          339 ns      2127119 H
BM_sim_H/11            22398 ns        22397 ns        31979 H
BM_sim_H/12            22154 ns        22154 ns        31469 H
BM_sim_H/13            21696 ns        21694 ns        32669 H
BM_sim_H/14            22625 ns        22624 ns        34865 H
BM_sim_H/15            24781 ns        24779 ns        28563 H
BM_sim_H/16            26008 ns        26005 ns        26738 H
BM_sim_H/17            26848 ns        26848 ns        25846 H
BM_sim_H/18            38348 ns        38346 ns        14273 H
BM_sim_H/19            63220 ns        63196 ns        10030 H
BM_sim_H/20           222427 ns       222415 ns         3095 H
BM_sim_H/21           417771 ns       417766 ns         1692 H
BM_sim_H/22           811376 ns       811313 ns          824 H
BM_sim_H/23          1616207 ns      1616176 ns          425 H
BM_sim_H/24          3205480 ns      3205312 ns          218 H
BM_sim_H/25          6370058 ns      6370056 ns          108 H
BM_sim_T/4               717 ns          637 ns      1215602 T
BM_sim_T/5               587 ns          523 ns      1086015 T
BM_sim_T/6               436 ns          397 ns      1701196 T
BM_sim_T/7               380 ns          360 ns      1939474 T
BM_sim_T/8               364 ns          344 ns      2005481 T
BM_sim_T/9               351 ns          342 ns      2111346 T
BM_sim_T/10              343 ns          338 ns      2089475 T
BM_sim_T/11            22577 ns        22576 ns        30130 T
BM_sim_T/12            21916 ns        21915 ns        30650 T
BM_sim_T/13            23063 ns        23063 ns        29928 T
BM_sim_T/14            23589 ns        23587 ns        29536 T
BM_sim_T/15            25484 ns        25483 ns        27702 T
BM_sim_T/16            26458 ns        26457 ns        25937 T
BM_sim_T/17            43570 ns        43565 ns        18343 T
BM_sim_T/18            60569 ns        60562 ns        10415 T
BM_sim_T/19            96977 ns        96973 ns         6546 T
BM_sim_T/20           379615 ns       379398 ns         1850 T
BM_sim_T/21           729255 ns       729208 ns          900 T
BM_sim_T/22          1435540 ns      1434500 ns          475 T
BM_sim_T/23          2853440 ns      2853381 ns          247 T
BM_sim_T/24          5664120 ns      5663730 ns          122 T
BM_sim_T/25         11243131 ns     11243213 ns           67 T
BM_sim_CNOT/4            632 ns          560 ns      1000000 CNOT
BM_sim_CNOT/5            605 ns          540 ns      1136015 CNOT
BM_sim_CNOT/6            728 ns          652 ns      1015970 CNOT
BM_sim_CNOT/7            466 ns          423 ns      1344763 CNOT
BM_sim_CNOT/8            378 ns          358 ns      1876809 CNOT
BM_sim_CNOT/9            369 ns          353 ns      1983322 CNOT
BM_sim_CNOT/10           403 ns          388 ns      2075209 CNOT
BM_sim_CNOT/11         23250 ns        23249 ns        31227 CNOT
BM_sim_CNOT/12         23835 ns        23834 ns        28826 CNOT
BM_sim_CNOT/13         23409 ns        23406 ns        29293 CNOT
BM_sim_CNOT/14         23761 ns        23757 ns        29621 CNOT
BM_sim_CNOT/15         23495 ns        23493 ns        29198 CNOT
BM_sim_CNOT/16         26407 ns        26405 ns        26702 CNOT
BM_sim_CNOT/17         26769 ns        26768 ns        25991 CNOT
BM_sim_CNOT/18         27915 ns        27914 ns        25448 CNOT
BM_sim_CNOT/19         49777 ns        49770 ns        12705 CNOT
BM_sim_CNOT/20        129554 ns       129543 ns         5199 CNOT
BM_sim_CNOT/21        256629 ns       256594 ns         2866 CNOT
BM_sim_CNOT/22        461934 ns       461758 ns         1329 CNOT
BM_sim_CNOT/23        907218 ns       904537 ns          748 CNOT
BM_sim_CNOT/24       1794867 ns      1789897 ns          391 CNOT
BM_sim_CNOT/25       3562287 ns      3551268 ns          198 CNOT
BM_sim_Toffoli/4         507 ns          454 ns      1471194 Toffoli
BM_sim_Toffoli/5         685 ns          632 ns      1301404 Toffoli
BM_sim_Toffoli/6         768 ns          716 ns       989815 Toffoli
BM_sim_Toffoli/7         767 ns          679 ns      1106098 Toffoli
BM_sim_Toffoli/8         475 ns          427 ns      1334113 Toffoli
BM_sim_Toffoli/9         391 ns          368 ns      1902690 Toffoli
BM_sim_Toffoli/10        385 ns          370 ns      1962712 Toffoli
BM_sim_Toffoli/11      25977 ns        25976 ns        27098 Toffoli
BM_sim_Toffoli/12      25873 ns        25872 ns        26669 Toffoli
BM_sim_Toffoli/13      25788 ns        25787 ns        27132 Toffoli
BM_sim_Toffoli/14      25891 ns        25889 ns        27039 Toffoli
BM_sim_Toffoli/15      26072 ns        26069 ns        26593 Toffoli
BM_sim_Toffoli/16      25902 ns        25900 ns        27050 Toffoli
BM_sim_Toffoli/17      26120 ns        26105 ns        26793 Toffoli
BM_sim_Toffoli/18      31619 ns        31614 ns        25620 Toffoli
BM_sim_Toffoli/19      48169 ns        48163 ns        12789 Toffoli
BM_sim_Toffoli/20     105447 ns       105442 ns         6300 Toffoli
BM_sim_Toffoli/21     185478 ns       185468 ns         3649 Toffoli
BM_sim_Toffoli/22     346589 ns       346534 ns         2008 Toffoli
BM_sim_Toffoli/23     675066 ns       674914 ns          968 Toffoli
BM_sim_Toffoli/24    1331968 ns      1330898 ns          515 Toffoli
BM_sim_Toffoli/25    2629952 ns      2629680 ns          265 Toffoli
BM_sim_Rx/4              719 ns          663 ns      1106929 Rx
BM_sim_Rx/5              532 ns          475 ns      1280681 Rx
BM_sim_Rx/6              434 ns          398 ns      1829955 Rx
BM_sim_Rx/7              384 ns          358 ns      1836666 Rx
BM_sim_Rx/8              378 ns          356 ns      2062166 Rx
BM_sim_Rx/9              356 ns          343 ns      2064817 Rx
BM_sim_Rx/10             341 ns          336 ns      2086995 Rx
BM_sim_Rx/11           23705 ns        23703 ns        21262 Rx
BM_sim_Rx/12           22926 ns        22909 ns        30208 Rx
BM_sim_Rx/13           22412 ns        22409 ns        29923 Rx
BM_sim_Rx/14           23252 ns        23237 ns        29999 Rx
BM_sim_Rx/15           25394 ns        25392 ns        27623 Rx
BM_sim_Rx/16           25146 ns        25130 ns        27066 Rx
BM_sim_Rx/17           26723 ns        26721 ns        26276 Rx
BM_sim_Rx/18           38209 ns        38198 ns        18392 Rx
BM_sim_Rx/19           62591 ns        62591 ns         9756 Rx
BM_sim_Rx/20          222190 ns       222185 ns         3165 Rx
BM_sim_Rx/21          417478 ns       417456 ns         1677 Rx
BM_sim_Rx/22          821739 ns       821690 ns          811 Rx
BM_sim_Rx/23         1619233 ns      1619143 ns          424 Rx
BM_sim_Rx/24         3230909 ns      3230844 ns          218 Rx
BM_sim_Rx/25         6374552 ns      6372848 ns          109 Rx
BM_sim_Ry/4              750 ns          665 ns      1226157 Ry
BM_sim_Ry/5              579 ns          519 ns      1119128 Ry
BM_sim_Ry/6              422 ns          387 ns      1762334 Ry
BM_sim_Ry/7              381 ns          357 ns      1904044 Ry
BM_sim_Ry/8              364 ns          348 ns      2055238 Ry
BM_sim_Ry/9              354 ns          345 ns      2019379 Ry
BM_sim_Ry/10             355 ns          351 ns      2018484 Ry
BM_sim_Ry/11           23131 ns        23130 ns        31061 Ry
BM_sim_Ry/12           23137 ns        23137 ns        30620 Ry
BM_sim_Ry/13           23140 ns        23136 ns        30239 Ry
BM_sim_Ry/14           23331 ns        23330 ns        30325 Ry
BM_sim_Ry/15           25961 ns        25959 ns        26672 Ry
BM_sim_Ry/16           26525 ns        26523 ns        26316 Ry
BM_sim_Ry/17           26798 ns        26793 ns        26204 Ry
BM_sim_Ry/18           38467 ns        38467 ns        18081 Ry
BM_sim_Ry/19           69862 ns        69862 ns        11228 Ry
BM_sim_Ry/20          221985 ns       221962 ns         3153 Ry
BM_sim_Ry/21          421635 ns       421581 ns         1661 Ry
BM_sim_Ry/22          822811 ns       822737 ns          857 Ry
BM_sim_Ry/23         1617523 ns      1617524 ns          436 Ry
BM_sim_Ry/24         3215180 ns      3214591 ns          220 Ry
BM_sim_Ry/25         6354351 ns      6353519 ns          119 Ry

The discontinuity at around 11 qubits likely due to "hybridization" between CPU and GPU, by the way, as this is the threshold to switch from one device to the other. Maybe the threshold could be tuned better, but, by wall clock time in the Qrack benchmark suite, CPU appears to start doubling at above about this threshold, whereas GPU might still drag slightly at the same qubit width due to failure to occupy all processing elements, basically.

Roger-luo commented 2 years ago

Very impressive work! This result seems to make sense in general I'm wondering if you could also implement the variational circuit benchmark for the sake of completeness?

Roger-luo commented 2 years ago

And as I mentioned in the issue if you want to show off the algorithms it's ok to include them but just remember to have a note page explaining what the algorithms is and what's the advantages and limitations. Then when I run the results I'll put a footnote similar to ddqsim.

WrathfulSpatula commented 2 years ago

Thanks for the quick turnaround, by the way! As it's New Year's Eve, I might not have time today, but I can definitely figure out how to implement the variational circuit as well, likely over the weekend. Since you say it's alright, I'll also add benchmarks and a notes document for our "default optimal layer stack," which is mostly a combination of Schmidt decomposition on kets and extended stabilizer subsystems that can also transparently fall back to ket, in addition to this underlying CPU/GPU "hybridized" ket simulation. Just to brag even more, we even switch over to an Intel-QS-like "paged" simulation once maximum single GPU allocation is exceeded, which gives about 2 additional qubits of width, since NVIDIA GPUs are almost universally "chopped" into 4 equal maximum allocation segments that OpenCL has to manually manage, while I think CUDA can allocate over the full VRAM transparently instead. All of the above works for multi-GPU or distributed simulation, as well.

I hope the community benefits from the Qrack contributors' work, and thank you for including us! Happy New Year!

WrathfulSpatula commented 2 years ago

I just added 1e4ef43 with the variational benchmark.

I'm very sorry for the delay, by the way. I was ill for a good chunk of January, unfortunately, but I feel much better, now!

If I were to add Qrack's default optimized "layer stack" benchmarks, (of which X, for example, and many of the other benchmarks are trivial,) I could please use your advice and where the best place is to put them, though.

WrathfulSpatula commented 2 years ago

40b73c0 adds a const bool that can be toggled in code to switch between default optimal stack and "ket" only. It's set for "ket" by default, as is appropriate.

WrathfulSpatula commented 2 years ago

Lastly, for 7bce003, it looks like ComputeStatistics() for "min" estimator is used for QuEST, so I take it that it should be used here, too. (It might have been turned off while the default optimal layer stack was in use.)

Roger-luo commented 2 years ago

thanks for this awesome work, do you want to do the default benchmark in this PR or a separate one? this PR currently looks good to me, I can merge it first if you want.

If I were to add Qrack's default optimized "layer stack" benchmarks, (of which X, for example, and many of the other benchmarks are trivial,) I could please use your advice and where the best place is to put them, though.

just use a separate folder for different build options etc.?

WrathfulSpatula commented 2 years ago

@Roger-luo Thanks for circling back quickly! Rather than duplicating the code, the bool with comment at the top of the benchmark file serves much the same purpose, without completely duplicating the code, up to that one setting. It generally bothers me when code needs to be changed by the user to invoke intended functionality, but I think your users can figure this one out, if they have reason to want that bool.

LGTM, too. Merge away!

WrathfulSpatula commented 2 years ago

By the way, the build assumes that your GPU in your benchmark instance has libOpenCL and an OpenCL ICD, for the same device you use for CUDA benchmarks. You probably already have both in that environment, but we'll see.

WrathfulSpatula commented 2 years ago

(I could make the setup.sh script set those up, if they aren't present, but they often are already when the environment is already configured for CUDA development, with the toolkit.)

Roger-luo commented 2 years ago

cool, thanks!

Roger-luo commented 2 years ago

it would take some time to update the benchmark results after my refactor of the benchmark suite tho, I'll try to take care of the deps in the refactor, we will see

WrathfulSpatula commented 2 years ago

Thank you! If libOpenCL.so and the ICD are present, as are usually installed with the CUDA toolkit anyway, the only other build dependency I can think of besides the cmake and g++ that are already in there is sudo apt install opencl-headers for the OpenCL C++ headers. Come find me if you can't figure it out, though.