quil-lang / quilc

The optimizing Quil compiler.
Apache License 2.0
460 stars 72 forks source link

quilc charges double for `RY(pi)` #829

Open braised-babbage opened 2 years ago

braised-babbage commented 2 years ago

A single RY gate translates to two RX gates, when we could get away with just one.

QUIL> (print-parsed-program (compiler-hook (parse-quil "RY(pi) 0") (build-nq-linear-chip 2)))
RZ(2.2216554087607694) 0                # Entering rewiring: #(0 1)
RX(pi/2) 0
RZ(pi) 0
RX(-pi/2) 0
RZ(2.2216554087607694) 0
HALT                                    # Exiting rewiring: #(0 1)
NIL
braised-babbage commented 2 years ago

Just a couple of notes from inspection of *compiler-noise*

Here are three ways to get the shorter decomposition (there may be more):

1.

(define-compiler Y-to-ZXZ ((y-gate ("RY" (#.pi) q)))
  (inst "RZ" '(#.-pi/2) q)
  (inst "RX" '(#.pi) q)
  (inst "RZ" '(#.pi/2) q))

will allow the addresser to emit better code for the chip. The compressor will subsequently respect this, since it only uses decompiled code if it's no worse than the original.

  1. (Not suggesting this is a good idea) If we get rid of all but euler-zxz-compiler, then the compressor uses this for decompilation, which happens to be the preferred method for this chip.

  2. We could add some rewriting rules that are tailored to standard patterns that occur in compiler outputs (e.g. a ZXZXZ sequence) and check conditions on the angles to reduce these to known, simpler sequences. I am not a huge fan of this, but it's at least of the flavor of "adding more information" to quilc.

braised-babbage commented 2 years ago

For what it's worth, I don't view the current behavior as being a real problem, in the sense that I know that on typical programs the overhead will not be a big deal (I nice guarantee from any flavor of Euler decompilation). However, it's a special enough case that people will naturally try it, so it would be nice for quilc to get it right. I think the way in which things go awry in the above example is interesting in the sense that it's a reminder of how some of this machinery fits together, and where things can get lost at the seams.

markasoftware commented 2 years ago

I ultimately agree that it doesn't seem to be a major issue -- usually the ZXZ decomposition is not applicable.

markasoftware commented 2 years ago

For anyone interested in how the compiler chooses which Euler decompositions to support: At the time the chip-spec is constructed, the compiler tries to find the best way to compile an arbitrary gate down to native gates (see compute-applicable-compilers and find-shortest-compiler-path). This happens to be the ZYZ Euler decomposition, followed by an RY-to-XZX decomposition to eliminate the inner Y.

markasoftware commented 2 years ago

After thinking more, adding a Y-to-ZXZ compiler as Erik suggested seems the best solution in my eyes. However, since Y-to-ZXZ is actually a special case of the Euler ZXZ transform, it may be valuable to add a general mechanism for creating specialized versions of a compiler. Eg:

(define-compiler FOO-to-BAR
    ((my-gate ("FOO" (theta) q))
     :specialize '((FOO-to-BAR-2pi (("FOO" (#.2pi) q)))
                   (FOO-to-BAR-pi (("FOO" (#.pi) q)))))
  (inst "BAR" (/ theta 2) q))

Would also define compilers FOO-to-BAR-2pi and FOO-to-BAR-pi, with outputs precomputed according to the input gates specified.

Then one could specialize the Euler compilers to add special cases for RY(pi) and similar.

The nativization will pick up the specialized compilers because their output may lie in the target gateset even when the more generic compiler's output does not.

Any thoughts?