yardstiq / quantum-benchmarks

benchmarking quantum circuit emulators for your daily research usage
Other
119 stars 28 forks source link

Improving Qiskit Aer benchmarks #9

Closed chriseclectic closed 4 years ago

chriseclectic commented 4 years ago

Hello @Roger-luo, I am a developer of Qiskit Aer and was recently shown your rather nice benchmark repo. I have some suggestions for how the qiskit benchmarks could be improved, since I feel they are under representing the simulator.

Suggestions:

Depending on what you are trying to show in benchmarks different timing is more important. I would argue for the Gate-level benchmarks you should show the C++ times, but for circuit level benchmarks that include results you would actually use I would show the Python time.

If you like I could put in a PR to this repo to make some of the suggested changes, but below I've included a code snippet for applying theses suggestions to a manual implementation of your X-gate benchmark:

import numpy as np
from qiskit import *
import time
import matplotlib.pyplot as plt

def native_execute(circuit, backend, backend_options):
    experiment = transpile(circuit, backend)  # Transpile to simulator basis gates
    qobj = assemble(experiment, shots=1)  # Set execution shots to 1
    start = time.time()
    result = backend.run(qobj, backend_options=backend_options).result()
    stop = time.time()
    time_py_full = stop - start  # Total execution time in python
    time_py_run = result.time_taken  # C++ measured total execution time excluding conversion of C++ results to Py results, and Py qobj to C++ qobj
    time_cpp_full = result.metadata['time_taken']  # C++ measured total execution time excluding conversion of C++ results to Py results, and Py qobj to C++ qobj
    time_cpp_expr = result.results[0].time_taken  # C++ measured execution time of a single circuit (ie. state init, apply gates, excludes other setup overhead for config options etc)
    return time_py_full, time_py_run, time_cpp_full, time_cpp_expr

def benchmark_x(qubit_range, samples, backend_options=None):

    backend = Aer.get_backend('qasm_simulator')

    ts_py_full = np.zeros(len(nqs))
    ts_py_run = np.zeros(len(nqs))
    ts_cpp_full = np.zeros(len(nqs))
    ts_cpp_exp = np.zeros(len(nqs))

    for i, nq in enumerate(qubit_range):
        qc = QuantumCircuit(nq)
        qc.x(0)

        t_py_full = 0
        t_py_run = 0
        t_cpp_full = 0
        t_cpp_exp = 0

        for _ in range(samples):
            t0, t1, t2, t3  = native_execute(qc, backend, backend_options)
            t_py_full += t0
            t_py_run += t1
            t_cpp_full += t2
            t_cpp_exp += t3

        # Average time in ns
        ts_py_full[i] = 1e9 * t_py_full / samples
        ts_py_run[i] = 1e9 * t_py_run / samples
        ts_cpp_full[i] = 1e9 * t_cpp_full / samples
        ts_cpp_exp[i] = 1e9 * t_cpp_exp / samples

    return ts_py_full, ts_py_run, ts_cpp_full, ts_cpp_exp

# Benchark: X gate on qubit-0
backend_options = {
    # Force Statevector method so stabilizer (clifford) simulator isn't used
    "method": "statevector",

    # Disable parallelization
    "max_parallel_threads": 1,

    # Stop simulator truncating to 1-qubit circuit simulations
    "truncate_enable": False,  
}  

nqs = list(range(5, 26))
ts_py_full1, ts_py_run1, ts_cpp_full1, ts_cpp_expr1 = benchmark_x(nqs, 1000, backend_options)

plt.semilogy(nqs, ts_py_full1, 'o-', label='Python (full)')
plt.semilogy(nqs, ts_py_run1, 's-', label='Python (run-only)')
plt.semilogy(nqs, ts_cpp_full1, '^-', label='C++ (full)')
plt.semilogy(nqs, ts_cpp_expr1, 'd-', label='C++ (experiment-only)')
plt.legend()
plt.grid()
plt.savefig('aer_x_qasm_sv.pdf')

Here is an example of running the above on my laptop:

aer_x_qasm_sv.pdf

Roger-luo commented 4 years ago

Hi @chriseclectic Thanks for your comment! It'd be nice if you could include the suggested changes in a PR, so I could help review it (since I'm still not sure which kind of option you would like to put into the qiskit benchmark), and I think we could also include multiple benchmarks for qiskit too. Please feel free to open a PR first, then I could also help you edit it.

In principle, as a demonstration of actual running time for simulation in practice, I think we should use user interface as much as possible. It's fair to benchmark through C++ if the C++ interface is an official API for users. But if the interface is in Python, we should in principle benchmark the python interface (through the standard python benchmark framework pytest-benchmark). For single gate benchmarks, yes, this is a test for the implementation of each instruction, it shows if certain acceleration tricks e.g SIMD is applied, or if the simulation algorithm is correct or not. But we currently do not have a C++ benchmark setup. This was also discussed in QuEST's benchmark review: https://github.com/Roger-luo/quantum-benchmarks/issues/5

This is what we do for other frameworks, but qiskit is the only exception at the moment. I had to write a custom execute function since I was not familiar with the Qiskit simulation backend, and I found the user interface will spawn a task that gives constant time in pybenchmark measurements.

Regarding to stabilizer simulator, is this a fair comparison for other simulators? since the benchmark was mainly made for variational circuits (at least at the moment) and all other frameworks are benchmarking the full amplitude simulation. Maybe we should let the stabilizer simulators benchmark with stabilizer simulators?

I'd love to have more professional benchmark scripts from developers themselves for sure (which was what we did for other frameworks). Thanks!

chriseclectic commented 4 years ago

I'm getting a PR ready that will update to the correct simulator backend, and will leave the timing as you have it currently setup with the native_execute function. We don't expose our C++ API directly so going through Python is fine. The Python overhead will just appear as a constant run-time for low qubit numbers. I think that for how Pytest works the native_execute function you have is the best approach since it bypasses the async job model for Qiskit providers.

As part of the PR I also enabled the QCBM circuit benchmarks since these are supported. The "native" gate set of the simulator doesn't matter since the Qiskit compiler handles conversion to supported basis gates (eg Rx -> u3, and Rz->u1). Another point is we just released an update to Qiskit Aer (version 0.4) a few days ago, which means the current native_execute function will no longer work (the internal method was renamed from _format_qobj_str to _format_qobj). This new release also included our first version of a GPU enabled simulator. Currently the GPU enabled simulator is only available for Linux and can be installed separately with pip install qiskit-aer-gpu. I can also add this to the benchmark scripts for the QCBM circuit.

One question with the benchmarks, and in particular GPU benchmarks, are you running them on the other configurations as single-precision or double-precision? We support both options for both CPU and GPU (default is double-precision).

With regards to the Stabilizer simulator, I agree it's not fair compare it to a statevector simulator because it can only simulate Clifford circuits, however since our simulator will choose it automatically if the input circuit if Clifford that is why you need to explicitly specify running on the statevector method.

chriseclectic commented 4 years ago

Here is the updated circuit benchmark for qiskit run on a server with a P100 GPU. Note I didn't re-run any of the other simulator benchmarks, just used the existing data in the repo.

pcircuit_relative