Including derivative in parallel L-BFGS-B method

AdrianPerezSalinas commented 3 years ago

Hi everyone, have a Happy New Year.

We are most of the time using variational circuits that depend on some tunable parameters. We have been using scipy methods to find the optimal parameters, and recently the parallel L-BFGS-B method was added to the repository. My proposal is to extend this method to the case where the gradient of the function is given for optimizing. I think this is useful for two main reasons:

We can implement exact gradients for quantum circuits as in the finite differences method. This will be helpful when optimization is done using quantum hardware instead of simulation because this gradient is much more resilient to noise.
In particular for reuploading-like circuits, the number of operations will be smaller. The finite differences method works by computing f(\theta +/- \pi / 2), where \theta is just one rotation angle. In reuploading circuits, \theta = w x + b, so with only one set of computation we find the gradient for 2 parameters (and more and more if more weights are used)

I have been looking to the code and saw that the core of the computation is somehow delegated to the standard scipy recipe with self.mp.Pool(processes=self.processes) as self.pool: from scipy.optimize import minimize out = minimize(fun=self.fun, x0=x0, jac=self.jac, method='L-BFGS-B', bounds=self.bounds, callback=self.callback, options=self.options)

Thus, it should be easy to implement what I ask for. The first step should be allowing the use of the keyword fprime which is already defined for this function. This is trivial, I think. The second step should be including the gradient function. However, this could be left to the user to pass as an argument of the code.

What do you think?

scarrazza commented 3 years ago

@AdrianPerezSalinas thanks for this issue. I think this is feasible, by computing the analytic gradients manually or automatically (via tensorflow, or other backends).

AdrianPerezSalinas commented 3 years ago

Do you think automatic gradients are compatible with the finite differences techniques?

scarrazza commented 3 years ago

Usually, automatic analytic gradients are more efficient than finite differences, which are already computed by the scipy minimize.

AdrianPerezSalinas commented 3 years ago

I think I did not explain myself properly. When I say finite differences for quantum circuits I am talking about a method that allows to compute the exact analytical gradient by shifting the values of the parameters a large quantity (for most operators this quantity is pi/2). This is robust against inherent statistical noise, so it is useful for the experiment. See Eqs. 13 and 14 from here: https://arxiv.org/pdf/1811.11184.pdf

scarrazza commented 3 years ago

Thanks for the clarification, sorry for the misunderstanding. Yes, this is something which may help, and for sure interesting to have build-in.

AdrianPerezSalinas commented 3 years ago

Nice! We can discuss it tomorrow

AdrianPerezSalinas commented 3 years ago

Hi @scarrazza , I leave here a document explaining my point with more details. Hope you find it useful! main.pdf

AdrianPerezSalinas commented 3 years ago

Hi @scarrazza , I have been testing the method I proposed to you. I have not done so much, but results are pretty promising. I have done a test for the standard VQE with small circuits, as states in QIBO's examples. If I compute the expected value of the hamiltonian exactly (0 shots), both methods return the same optimization path VQE_0shots.pdf

However, if I do the same with measurements (approximation is kind of rough), optimization is really different. It gets stuck quickly since the derivatives do not give any information. VQE_10000000.0shots.pdf

In addition it looks like it does not matter how many shots you perform to estimate the hamiltonian, results are not good

Today I will check the next steps

scarrazza commented 3 years ago

Ok, thanks for this tests, lets see.

AdrianPerezSalinas commented 3 years ago

I have tested a fitting problem like qPDF, results are comparable fit_0shots.pdf fit_10000shots.pdf

AdrianPerezSalinas commented 3 years ago

Same test for an easy classifier, classifier_0shots.pdf classifier_10000shots.pdf

I think that it is clear, we need to implement this functionality when applying optimization to measurements. In addition, at least for measurement simulations, it is way faster to do it in this new way.

Do you want me to show you the code?

alhajri commented 3 years ago

Hi @AdrianPerezSalinas , just wondering if you are aware of this paper by Simon Benjamin on "Quantum Analytic Descent": https://arxiv.org/abs/2008.13774

AdrianPerezSalinas commented 3 years ago

I was not aware of it, but thank you very much for pointing it out!

I have taken a look at it and it sounds really interesting. However, after the discussion today I think that the exact derivatives method or this one can reduce the error due to sampling, but cannot deal with imperfect circuits. We will have to further investigate about it

AdrianPerezSalinas commented 3 years ago

I leave here the results of some tests made with the exact derivative and a VQE model. As you may see, with errors of order 0.1% we can still have some minimization. Noisier circuits are chaotic.

VQE_100000.0shots.pdf

I think that implementing this kind of exact derivatives could be interesting only if we know that the circuit noise is below a certain threshold.

scarrazza commented 3 years ago

OK and how this compares to the numerical derivative for similar configurations?

AdrianPerezSalinas commented 3 years ago

Nothing returns result extremely good, but exact derivatives are more resilient to noise and errors than numerical ones. Not too much, though

VQE_0shots_0.01.pdf VQE_0shots_0.001.pdf VQE_10000shots_0.01.pdf VQE_10000shots_0.001.pdf

qiboteam / qibo

Including derivative in parallel L-BFGS-B method #309