qiskit-community / qiskit-experiments

Qiskit Experiments
https://qiskit-community.github.io/qiskit-experiments/
Apache License 2.0
153 stars 125 forks source link

add bayesian_randomized_benchmarking tutorial #377

Open pdc-quantum opened 3 years ago

pdc-quantum commented 3 years ago

What is the expected behavior?

Part of this ongoing qiskit advocates mentorship program fall 21: issue 34.

Mentor: Shelly Garion (@ShellyGarion), Research Staff Member at IBM Research Haifa, Qiskit developer.

Mentees: Pierre Decoodt (@pdc-quantum) and SheshaShayee Raghunathan (@shesha-raghunathan).

Bayesian PyMC3 implementation on top of frequentist models for standard and interleaved randomized benchmarking (RB) featured in qiskit experiments.

Based on the tutorial randomized_benchmarking. The RB parameters and the analysis variables are identical.

The protocols for standard and interleaved RB are described in the first two entries of Table 1 of this paper. The Bayesian hierarchical models are based on equation 13 of the same paper.

The serial Monte Carlo (SMC) algorithm used for posterior sampling is the No-U-Turn Sampler featured in PyMC3.

ArviZ is used for the exploratory analysis of the results.

This demo shows that the bayesian approach delivers on simulator a good estimation of the error per Clifford and per gate, within narrow bounds. Experiments on hardware (out of the scope of this issue) were also conclusive.

See for examples of simulator and hardware experiments: bayesan-randomized-benchmarking.pdf

pdc-quantum commented 2 years ago

Note: Demonsrtrating the advantage of the Bayesian approach The aim is to optimize the total number of experiments: (number of lengths M) x (number of circuits per length I ). The graph presents results obtained with the FakeBelem backend on cx [1,2]. The backend reference value for EPG is 0.01069. The Bayesian serial Monte Carlo algorithm (SMC) was compared to the existing frequentist least squares fit (LSF) of Qiskit experiment. The circuit lengths were range(1, 200, 30) for M = 7, range(1, 200, 15) for M = 14. The explored numbers of circuit per length were 60 and 120. The number of shots was 1024.

Increasing I or increasing M result in EPG estimate nearer to the target. Increasing both I and M was even better. Lower error bounds (error bars) were observed with SMC. The reported error bounds decreased when increasing M or I with SMC, but only when increasing M with LSF. So, the combination of M and I value in SMC for a given total number of circuits can be adapted for privileging the EPG estimate or its upper bound. This is not possible with existing LSF, as the bounds can only be narrowed by increasing M. image

pdc-quantum commented 2 years ago

Update on demonstrating the interest of the Bayesian method

When comparing the results of the Bayesian model with those of the frequentist model in Qiskit experiments, a bug in this latter was suspected https://github.com/Qiskit/qiskit-experiments/issues/428 It consists of excessive bounds for the error per gate (EPG), not varying with the number of circuits for each considered length. This is now subject to a PR https://github.com/Qiskit/qiskit-experiments/pull/472

Consequently the figure presented in https://github.com/Qiskit/qiskit-experiments/issues/377#issuecomment-942738602 becomes:

image

The bounds are now of the same order.

When testing the model on hardware, the average error on EPG was slightly lower using the Bayesian model. See this comment.

In fact, claiming that the Bayesian method is better because it delivers narrower error bounds can be subject to criticism. You can imagine an absolutely incorrect statistical model delivering very narrow bounds which do not reflect the reality. The upper bound problem is the most relevant: you need to be sure that you are above some nines when figuring out the hardware gate error (see this tweet). At the end of the conclusions of this paper, the possibility of a systematic tendency to over-report gate qualities using a frequentist model is mentioned.

One can claim that the Bayesian model is justified for two main reasons: 1) It is based on priors (eg Beta and Binomial distributions) which are more realistic than in the least squares fitter model 2) It delivers an inferred distribution through the MCMC process which reflects better the underlying real distribution. From this inferred distribution, we get the HDI and a credible upper bound for the gate error. An inferred distribution is also more likely to reflect the complex noise processes that we can encounter in hardware.

pdc-quantum commented 2 years ago

The comments for the final showcase of qamp-fall-21 are now available here.

More experimentation was performed. The protocols and results are described in the comments. As these studies were on hardware, what they showed reasonably supersedes whatever was found on simulator. The main finding was that the Bayesian statistical model allowed to narrow on average the error bounds of EPG in a variety of RB protocols.