Closed ghost closed 1 year ago
looks good in general, any luck to get some result from this? would be helpful for review to post them here
Raw benchmark data:
2020-12-17T01:48:57+00:00
Running ./benchmarks
Run on (8 X 3610.15 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 1024 KiB (x4)
L3 Unified 36608 KiB (x1)
Load Average: 0.06, 0.25, 0.21
------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------
BM_sim_X/4 2325 ns 2325 ns 300698 X
BM_sim_X/5 2443 ns 2443 ns 287186 X
BM_sim_X/6 2487 ns 2487 ns 282605 X
BM_sim_X/7 2531 ns 2531 ns 276338 X
BM_sim_X/8 2555 ns 2555 ns 273946 X
BM_sim_X/9 2592 ns 2592 ns 269582 X
BM_sim_X/10 2627 ns 2627 ns 266709 X
BM_sim_X/11 2662 ns 2662 ns 262402 X
BM_sim_X/12 2710 ns 2709 ns 259586 X
BM_sim_X/13 2741 ns 2741 ns 255005 X
BM_sim_X/14 2780 ns 2780 ns 251642 X
BM_sim_X/15 2819 ns 2819 ns 248317 X
BM_sim_X/16 2863 ns 2862 ns 244923 X
BM_sim_X/17 2905 ns 2905 ns 240793 X
BM_sim_X/18 2942 ns 2942 ns 237475 X
BM_sim_X/19 2982 ns 2982 ns 234879 X
BM_sim_X/20 3020 ns 3020 ns 231804 X
BM_sim_X/21 3065 ns 3065 ns 228264 X
BM_sim_X/22 3103 ns 3103 ns 225451 X
BM_sim_X/23 3134 ns 3134 ns 223370 X
BM_sim_X/24 3158 ns 3158 ns 221067 X
BM_sim_X/25 3197 ns 3197 ns 218951 X
BM_sim_H/4 2285 ns 2285 ns 306449 H
BM_sim_H/5 2396 ns 2396 ns 291870 H
BM_sim_H/6 2436 ns 2436 ns 287337 H
BM_sim_H/7 2472 ns 2472 ns 283326 H
BM_sim_H/8 2510 ns 2510 ns 278786 H
BM_sim_H/9 2557 ns 2557 ns 273911 H
BM_sim_H/10 2596 ns 2596 ns 268826 H
BM_sim_H/11 2631 ns 2631 ns 265843 H
BM_sim_H/12 2676 ns 2676 ns 261616 H
BM_sim_H/13 2710 ns 2710 ns 258172 H
BM_sim_H/14 2747 ns 2747 ns 254897 H
BM_sim_H/15 2787 ns 2787 ns 251065 H
BM_sim_H/16 2828 ns 2828 ns 247622 H
BM_sim_H/17 2868 ns 2868 ns 243922 H
BM_sim_H/18 2909 ns 2909 ns 240744 H
BM_sim_H/19 2949 ns 2949 ns 237195 H
BM_sim_H/20 2984 ns 2984 ns 234385 H
BM_sim_H/21 3033 ns 3033 ns 231157 H
BM_sim_H/22 3080 ns 3080 ns 227593 H
BM_sim_H/23 3114 ns 3114 ns 224959 H
BM_sim_H/24 3148 ns 3147 ns 221775 H
BM_sim_H/25 3184 ns 3184 ns 220090 H
BM_sim_T/4 2390 ns 2389 ns 293069 T
BM_sim_T/5 2559 ns 2559 ns 273463 T
BM_sim_T/6 2613 ns 2613 ns 269455 T
BM_sim_T/7 2650 ns 2650 ns 264227 T
BM_sim_T/8 2685 ns 2684 ns 260352 T
BM_sim_T/9 2734 ns 2734 ns 256540 T
BM_sim_T/10 2777 ns 2777 ns 252176 T
BM_sim_T/11 2815 ns 2815 ns 248837 T
BM_sim_T/12 2858 ns 2858 ns 245207 T
BM_sim_T/13 2885 ns 2885 ns 242803 T
BM_sim_T/14 2922 ns 2922 ns 239540 T
BM_sim_T/15 2957 ns 2957 ns 236797 T
BM_sim_T/16 3005 ns 3005 ns 232891 T
BM_sim_T/17 3046 ns 3046 ns 229727 T
BM_sim_T/18 3081 ns 3081 ns 227016 T
BM_sim_T/19 3128 ns 3127 ns 224105 T
BM_sim_T/20 3154 ns 3154 ns 222053 T
BM_sim_T/21 3189 ns 3188 ns 219396 T
BM_sim_T/22 3219 ns 3218 ns 217595 T
BM_sim_T/23 3263 ns 3263 ns 214879 T
BM_sim_T/24 3309 ns 3309 ns 212068 T
BM_sim_T/25 3358 ns 3358 ns 208262 T
BM_sim_CNOT/4 2338 ns 2338 ns 299177 CNOT
BM_sim_CNOT/5 2485 ns 2485 ns 284034 CNOT
BM_sim_CNOT/6 2521 ns 2521 ns 277409 CNOT
BM_sim_CNOT/7 2547 ns 2546 ns 274040 CNOT
BM_sim_CNOT/8 2588 ns 2588 ns 270653 CNOT
BM_sim_CNOT/9 2623 ns 2623 ns 266634 CNOT
BM_sim_CNOT/10 2668 ns 2668 ns 262382 CNOT
BM_sim_CNOT/11 2709 ns 2709 ns 258427 CNOT
BM_sim_CNOT/12 2757 ns 2757 ns 254590 CNOT
BM_sim_CNOT/13 2777 ns 2776 ns 252118 CNOT
BM_sim_CNOT/14 2816 ns 2816 ns 248648 CNOT
BM_sim_CNOT/15 2849 ns 2849 ns 245789 CNOT
BM_sim_CNOT/16 2892 ns 2892 ns 242138 CNOT
BM_sim_CNOT/17 2935 ns 2935 ns 238459 CNOT
BM_sim_CNOT/18 2974 ns 2974 ns 235151 CNOT
BM_sim_CNOT/19 3016 ns 3016 ns 231221 CNOT
BM_sim_CNOT/20 3057 ns 3057 ns 229293 CNOT
BM_sim_CNOT/21 3100 ns 3100 ns 225620 CNOT
BM_sim_CNOT/22 3139 ns 3139 ns 223573 CNOT
BM_sim_CNOT/23 3181 ns 3181 ns 220125 CNOT
BM_sim_CNOT/24 3218 ns 3217 ns 217522 CNOT
BM_sim_CNOT/25 3256 ns 3255 ns 215031 CNOT
BM_sim_Toffoli/4 2347 ns 2347 ns 298082 Toffoli
BM_sim_Toffoli/5 2476 ns 2476 ns 282757 Toffoli
BM_sim_Toffoli/6 2520 ns 2520 ns 278365 Toffoli
BM_sim_Toffoli/7 2553 ns 2553 ns 272971 Toffoli
BM_sim_Toffoli/8 2598 ns 2598 ns 269126 Toffoli
BM_sim_Toffoli/9 2639 ns 2639 ns 265605 Toffoli
BM_sim_Toffoli/10 2673 ns 2672 ns 260935 Toffoli
BM_sim_Toffoli/11 2713 ns 2713 ns 258074 Toffoli
BM_sim_Toffoli/12 2747 ns 2747 ns 254612 Toffoli
BM_sim_Toffoli/13 2781 ns 2781 ns 251833 Toffoli
BM_sim_Toffoli/14 2828 ns 2828 ns 247878 Toffoli
BM_sim_Toffoli/15 2862 ns 2862 ns 244129 Toffoli
BM_sim_Toffoli/16 2900 ns 2900 ns 241267 Toffoli
BM_sim_Toffoli/17 2946 ns 2945 ns 237675 Toffoli
BM_sim_Toffoli/18 2973 ns 2973 ns 235327 Toffoli
BM_sim_Toffoli/19 3009 ns 3009 ns 232663 Toffoli
BM_sim_Toffoli/20 3048 ns 3048 ns 229893 Toffoli
BM_sim_Toffoli/21 3089 ns 3089 ns 226788 Toffoli
BM_sim_Toffoli/22 3130 ns 3130 ns 223689 Toffoli
BM_sim_Toffoli/23 3162 ns 3162 ns 221027 Toffoli
BM_sim_Toffoli/24 3198 ns 3198 ns 218803 Toffoli
BM_sim_Toffoli/25 3242 ns 3242 ns 216039 Toffoli
BM_sim_Rx/4 2435 ns 2434 ns 287452 Rx
BM_sim_Rx/5 2602 ns 2602 ns 268946 Rx
BM_sim_Rx/6 2640 ns 2640 ns 265019 Rx
BM_sim_Rx/7 2680 ns 2680 ns 261219 Rx
BM_sim_Rx/8 2722 ns 2722 ns 257203 Rx
BM_sim_Rx/9 2760 ns 2760 ns 253315 Rx
BM_sim_Rx/10 2800 ns 2800 ns 249942 Rx
BM_sim_Rx/11 2842 ns 2842 ns 246519 Rx
BM_sim_Rx/12 2879 ns 2879 ns 242858 Rx
BM_sim_Rx/13 2907 ns 2907 ns 240554 Rx
BM_sim_Rx/14 2954 ns 2954 ns 237051 Rx
BM_sim_Rx/15 2987 ns 2987 ns 233982 Rx
BM_sim_Rx/16 3030 ns 3030 ns 231170 Rx
BM_sim_Rx/17 3063 ns 3063 ns 228366 Rx
BM_sim_Rx/18 3104 ns 3104 ns 225679 Rx
BM_sim_Rx/19 3149 ns 3149 ns 222325 Rx
BM_sim_Rx/20 3190 ns 3190 ns 219700 Rx
BM_sim_Rx/21 3224 ns 3224 ns 217147 Rx
BM_sim_Rx/22 3267 ns 3267 ns 214087 Rx
BM_sim_Rx/23 3311 ns 3311 ns 211444 Rx
BM_sim_Rx/24 3350 ns 3350 ns 208837 Rx
BM_sim_Rx/25 3384 ns 3383 ns 207047 Rx
BM_sim_Ry/4 2433 ns 2432 ns 287713 Ry
BM_sim_Ry/5 2614 ns 2614 ns 269121 Ry
BM_sim_Ry/6 2647 ns 2647 ns 264815 Ry
BM_sim_Ry/7 2686 ns 2686 ns 260722 Ry
BM_sim_Ry/8 2727 ns 2727 ns 257131 Ry
BM_sim_Ry/9 2768 ns 2768 ns 253294 Ry
BM_sim_Ry/10 2807 ns 2807 ns 249006 Ry
BM_sim_Ry/11 2843 ns 2843 ns 246388 Ry
BM_sim_Ry/12 2885 ns 2885 ns 242697 Ry
BM_sim_Ry/13 2915 ns 2915 ns 240148 Ry
BM_sim_Ry/14 2957 ns 2957 ns 236145 Ry
BM_sim_Ry/15 2994 ns 2994 ns 233988 Ry
BM_sim_Ry/16 3038 ns 3038 ns 230809 Ry
BM_sim_Ry/17 3073 ns 3073 ns 227886 Ry
BM_sim_Ry/18 3120 ns 3120 ns 224494 Ry
BM_sim_Ry/19 3158 ns 3158 ns 221596 Ry
BM_sim_Ry/20 3190 ns 3190 ns 219390 Ry
BM_sim_Ry/21 3231 ns 3231 ns 216537 Ry
BM_sim_Ry/22 3277 ns 3277 ns 213957 Ry
BM_sim_Ry/23 3316 ns 3316 ns 211049 Ry
BM_sim_Ry/24 3356 ns 3356 ns 208938 Ry
BM_sim_Ry/25 3394 ns 3393 ns 206282 Ry
I think we will be pending this benchmark result for now since qrack is doing this as a stabilizer simulation.
Dan from vm6502q/qrack, here! I happened to notice the PR, and I appreciate the consideration!
The QINTERFACE_OPTIMAL
or QINTERFACE_OPTIMAL_MULTI
default stacks include layers for both stabilizer and Schmidt decomposition, I can confirm, though they support the general universal QInterface
API and transparently switch to (Schmidt decomposed or full-width) "ket" representation as necessary. I also appreciate that this might not be an apples-to-apples comparison, therefore. However, note that the simulation is intended to be "exact," despite this simulation method switching. Our "Schrödinger method" interface is anything inheriting from QEngine
, specifically QEngineCPU
and QEngineOCL
, though these should be no more or less "exact" than the default optimal stack, in terms of preserving any and all Hermitian observables, in the sense that the norm of the inner product of a QEngine
state with a QUnit
state representation, having both run the same circuit, can always be formed and shown to be 1, or |<a|b>|^2=1
.
Incidentally, I worked for a month on debugging to reach vm6502q.v6.1.1, up to this past Monday, and we honestly caught 20+ separate bugs in that round, in the stabilizer, Schmidt decomposition, and "paging" layers, comprising the [mirror]
integration test suite in the Qrack repo, now all fixed. However, that seems to exhaust random mirror circuit failures to 27 qubits in width, on my local development machine, just FYI. (The implementation of test_mirror_circuit
in our tests/benchmarks.cpp
might make the point here clearer.)
Thanks again!
If the desired comparison is strictly between ket-based simulations, the QHybrid
layer of Qrack on its own is what we're looking for.
In C++, the simulator factory can be initialized with just the QINTERFACE_HYBRID
enum value (instead of QINTERFACE_OPTIMAL
or QINTERFACE_OPTIMAL_MULTI
).
In Python, a PyQrack simulator instance can be constructed with the kwargs isMultiDevice=False, isSchmidtDecompose=False, isStabilizerHybrid=False, isBinaryDecisionTree=False, isPaged=False, is1QbFusion=False, isCpuGpuHybrid=True
. (That is, all options should be off except isCpuGpuHybrid
.)
The "hybrid" part of this case is hybridization of CPU and GPU, but it's still strictly ket. (If we want to fine-grain between CPU and GPU based simulation techniques, we can do that, too.)
Added the following benchmarks for the following gates using the Qrack simulator
The setup.sh file has some kinks that need to be worked out, but the benchmark works. Will push another revision once I can iron those out.