Closed chriseclectic closed 4 years ago
Hi @chriseclectic Thanks for your comment! It'd be nice if you could include the suggested changes in a PR, so I could help review it (since I'm still not sure which kind of option you would like to put into the qiskit benchmark), and I think we could also include multiple benchmarks for qiskit too. Please feel free to open a PR first, then I could also help you edit it.
In principle, as a demonstration of actual running time for simulation in practice, I think we should use user interface as much as possible. It's fair to benchmark through C++ if the C++ interface is an official API for users. But if the interface is in Python, we should in principle benchmark the python interface (through the standard python benchmark framework pytest-benchmark
). For single gate benchmarks, yes, this is a test for the implementation of each instruction, it shows if certain acceleration tricks e.g SIMD is applied, or if the simulation algorithm is correct or not. But we currently do not have a C++ benchmark setup. This was also discussed in QuEST's benchmark review: https://github.com/Roger-luo/quantum-benchmarks/issues/5
This is what we do for other frameworks, but qiskit is the only exception at the moment. I had to write a custom execute function since I was not familiar with the Qiskit simulation backend, and I found the user interface will spawn a task that gives constant time in pybenchmark
measurements.
Regarding to stabilizer simulator, is this a fair comparison for other simulators? since the benchmark was mainly made for variational circuits (at least at the moment) and all other frameworks are benchmarking the full amplitude simulation. Maybe we should let the stabilizer simulators benchmark with stabilizer simulators?
I'd love to have more professional benchmark scripts from developers themselves for sure (which was what we did for other frameworks). Thanks!
I'm getting a PR ready that will update to the correct simulator backend, and will leave the timing as you have it currently setup with the native_execute
function. We don't expose our C++ API directly so going through Python is fine. The Python overhead will just appear as a constant run-time for low qubit numbers. I think that for how Pytest works the native_execute
function you have is the best approach since it bypasses the async job model for Qiskit providers.
As part of the PR I also enabled the QCBM circuit benchmarks since these are supported. The "native" gate set of the simulator doesn't matter since the Qiskit compiler handles conversion to supported basis gates (eg Rx -> u3, and Rz->u1). Another point is we just released an update to Qiskit Aer (version 0.4) a few days ago, which means the current native_execute
function will no longer work (the internal method was renamed from _format_qobj_str
to _format_qobj
). This new release also included our first version of a GPU enabled simulator. Currently the GPU enabled simulator is only available for Linux and can be installed separately with pip install qiskit-aer-gpu
. I can also add this to the benchmark scripts for the QCBM circuit.
One question with the benchmarks, and in particular GPU benchmarks, are you running them on the other configurations as single-precision or double-precision? We support both options for both CPU and GPU (default is double-precision).
With regards to the Stabilizer simulator, I agree it's not fair compare it to a statevector simulator because it can only simulate Clifford circuits, however since our simulator will choose it automatically if the input circuit if Clifford that is why you need to explicitly specify running on the statevector method.
Here is the updated circuit benchmark for qiskit run on a server with a P100 GPU. Note I didn't re-run any of the other simulator benchmarks, just used the existing data in the repo.
Hello @Roger-luo, I am a developer of Qiskit Aer and was recently shown your rather nice benchmark repo. I have some suggestions for how the qiskit benchmarks could be improved, since I feel they are under representing the simulator.
Suggestions:
When you transpile the circuit in qiskit you need to include the backend so that it compiles to the native basis gates of the simulator, otherwise it will unroll all single-qubit gates to u3 gates.
You shouldn't be using the statevector simulator for benchmarks, rather you should be using the qasm_simulator. The statevector simulator has a lot of overhead in serializing the statevector via JSON, where as the qasm simulator does not (you can still ask for a snapshot of the statevector in the qasm simulator). This overhead has been improved somewhat in our next release due to replacing JSON with Pybind11, but it still under-represents the simulator if you are interested in timing how fast it is applying gates.
The qasm simulator has numerous options for method and parallelization that you may want to explicitly configure. Eg:
How you report the time taken depends on what you are trying to benchmark. Aer includes a lot of overhead in its result data output. So if you are trying to profile the time of a single gate, you can get a more accurate measure of that excluding the result serialization if desired. The different ways of timing include:
backend.run
Result.time_taken
). This excludes the time in initializing and validating the Python result object from the output python dict of the simulatorResult.metadata['time_taken']
). This excludes the C++ -> Py result conversion overhead.Result.results[0].time_taken
). This excludes any overhead for validation and configuration settings in the C++ simulator, and any Py -> C++ conversion.Depending on what you are trying to show in benchmarks different timing is more important. I would argue for the Gate-level benchmarks you should show the C++ times, but for circuit level benchmarks that include results you would actually use I would show the Python time.
If you like I could put in a PR to this repo to make some of the suggested changes, but below I've included a code snippet for applying theses suggestions to a manual implementation of your X-gate benchmark:
Here is an example of running the above on my laptop:
aer_x_qasm_sv.pdf