yardstiq / quantum-benchmarks

benchmarking quantum circuit emulators for your daily research usage
Other
119 stars 28 forks source link

Review PennyLane #7

Closed johannesjmeyer closed 4 years ago

johannesjmeyer commented 4 years ago

Hi @Roger-luo,

I just reviewed your benchmarks of pennylane. I also added single gate benchmarks for the sake of completeness, but one can not faithfully include them in the comparison due to the following reason:

In pennylane, circuits are represented by quantum functions like

@qml.qnode(dev)
def circuit(vars):
    qml.PauliX(0)
    qml.PauliX(1)
    ...
    return qml.expval(qml.PauliZ(0))

As pennylane is primarily a library for gradient propagation, every such quantum function must return either an expectation value, a variance or a sample. As default.qubit is a simulator it will calculate the output state of the circuit and then apply some postprocessing (marginalizing the state and calculating the expectation value or variance or generating a sample via np.random.choice). In the case of small circuits, this post-processing creates most of the cost, especially for a large number of qubits.

One could test the gate application times of default.qubit (they are at least twice as fast as Cirq at the moment), but then one hast to circumvent the regular user interface.

So my suggestion is to either not include pennylane in the gate comparison or, if that is okay for you, to circumvent the user interface to acces the raw gate application times.

What do you think @Roger-luo?

johannesjmeyer commented 4 years ago

I actually figured out a way of how to integrate Pennylane realistically. I modified the underlying device so that it raises an exception after the gate applications are executed and before the expectation values etc. are calculated. In this way it both uses the UI and still gives a fair comparison.

Roger-luo commented 4 years ago

oh thanks for the review! that's a clever hack :-) I'll merge this first then, and I'll cc you after I re-run all the benchmarks this weekend. https://github.com/Roger-luo/quantum-benchmarks/issues/8

So my suggestion is to either not include pennylane in the gate comparison or, if that is okay for you, to circumvent the user interface to acces the raw gate application times.

I'm happy to add whatever kind of benchmark you guys think is more fair since I'm not very familiar with PennyLane's code base.

the goal of single gate benchmark is to show the overhead of each single "simulated" instruction (from user interface, but not a strong requirement), and the goal of the parameterized circuit is to show the overhead of creating abstractions over quantum circuits.


PS. Yes, it is not very fair to PennyLane since its default backend is not designed to be performant IIRC and error handling has an overhead, I'm happy to have some comments in benchmark results to let people know this fact if you would like to add some.

but I'm also thinking to add a benchmark on a complete training process which includes AD later (probably after the first version of our paper comes out), which should be more fair to PennyLane on the benchmark I guess. But do you know what is the current most performant backend for PennyLane? so I could get rid of the overhead of circuit simulation and only show the overhead of AD engine and abstractions.

As pennylane is primarily a library for gradient propagation, every such quantum function must return either an expectation value, a variance or a sample.

does PennyLane currently have reverse mode AD (I didn't find it in the paper tho)? or it only supports forward mode (aka faithful quantum gradients)? I just heard the reverse mode AD is supported in strawberry fields for CV only?

co9olguy commented 4 years ago

@Roger-luo: in the current version of pennylane, all autodiff interfaces see the quantum circuits as black boxes (i.e., no backpropagation through the quantum circuit). We provide custom gradients by using the (hardware-compatible) parameter shift method.

However, we're currently close to finishing an update to the AD capabilities that will allow backprop on the built-in simulator (recognizing that this is faster for simulators)

co9olguy commented 4 years ago

just for curiousity, what do you mean by "faithful quantum gradients" and how is it related to forward-mode AD?

Roger-luo commented 4 years ago

just for curiousity, what do you mean by "faithful quantum gradients" and how is it related to forward-mode AD?

the phase kicking mentioned in QCL paper, "faithful" means runnable on real quantum device. forward mode AD is the same with symbolic AD and phase kicking can be derived from symbolic AD. It might be easier to explain on black board for more details, I could explain this in person when I'm in Toronto on 24th.

in the current version of pennylane, all autodiff interfaces see the quantum circuits as black boxes (i.e., no backpropagation through the quantum circuit). We provide custom gradients by using the (hardware-compatible) parameter shift method.

However, we're currently close to finishing an update to the AD capabilities that will allow backprop on the built-in simulator (recognizing that this is faster for simulators)

I see. I guess I'll just say PL supports forward mode but there is also a work-in-progress reverse mode support in the paper then.

johannesjmeyer commented 4 years ago

What is symbolic AD? Can you point me to a reference here? :)

And actually the default backend of PennyLane underwent some major improvements and is now really quick.

It would be good if you mention that PennyLane has certain overhead not present in the other frameworks due to it being intended for the calculation of gradients.

Roger-luo commented 4 years ago

Sorry I should say symbolic differentiation, there's a lot course slides online if you just search this.

We could update this benchmark once you finish the new default backend. That's why it's public on github: it shows why it's slow and how to improve it. I think we've helped qulacs people locate the SIMD issue from this benchmark. Yao just provides a baseline here with its generic implementation.