quil-lang / quilc

The optimizing Quil compiler.
Apache License 2.0
447 stars 73 forks source link

"Efficient template matching in quantum circuits" paper #486

Open stylewarning opened 4 years ago

stylewarning commented 4 years ago

There's a wonderful paper by Raban Iten, David Sutter, Stefan Woerner on compressor-like functionality for quantum circuits. At minimum, we should review it. At maximum, we should implement it. What do you folks think?

ecpeterson commented 4 years ago

I think that their method is superior whenever it overlaps ours (viz., peephole rewriting input circuits of a fixed shape mapping onto output circuits of a fixed shape). However, the compressor broadly does additional things that I think will take significant effort (or which I don't even understand how) to fit into their framework:

  1. The compressor selects unstructured subgraphs of instructions to apply its (instructions -> single matrix -> instructions) logic to. I feel like this is something their searching methods could be modified to handle, but I don't think it's spelled out and it's not clear to me how to intelligently do it in tandem with the more usual rewrites.
  2. One of their selling points is that their method allows for template reuse: for example, if U1 ... Un = 1 is recorded as a template and Ui ... Uj is found as a subsequence, then it can be replaced by U(i-1)^-1 ... U1^-1 Un^-1 ... U(j+1)^-1 (& similar generalizations for subgraphs). It's less clear to me to what degree this really has legs when working with fancier "templates", like those with parametric gates, or those which accept a wide variety of gates (e.g., the tweedledum permutation compiler) and the shape of whose output is conditional on the precise input given (e.g., the tweedledum permutation compiler).
  3. The compressor optionally tracks partial quantum state through the program. This gets somewhat fussy when de-linearizing the traversal order: given a subgraph of instructions, you have to make sure you've used the entire "lightcone" of instructions behind a given subgraph to generate the present state. Making sure that this is so, that this is stable under graph transformations, & that it can be partially memoized will require care.

It's possible to dodge all 3 of these by retaining the compressor logic and only swapping out the peephole optimizer for their method. However, I think that their method is robust enough and has enough overlap with the broader compressor that I'd want to put serious thought into how much of the compressor can be outright replaced. Of course, I would also want to have proposed partial solutions to each of the items above before proceeding with an implementation.

ecpeterson commented 4 years ago
  1. I think it's possible to extract such graphs from the canonical graph. Start with a set of instructions S, and iteratively enlarge it by (1) unreachable instructions (in the undirected sense) which consume no more resources; (2) instructions which are direct predecessors which consume no more resources; (3) instructions which are direct successors which consume no more resources; (4) instructions which are either direct successors or direct predecessors and which enlarge the resource set in some controlled way (say: up to some bound on the size). I think a reasonable strategy would be to start with a single instruction, resolve (1) first, then alternatingly resolve (2) and (3), then add a single instruction of type (4) which increases the overall resource count minimally**, then repeat until (4) strikes some bound.

  2. This seems like it might be a red herring. Honestly, how often are the tweedledum compilers called during optimization rather than nativization? I'd advise ignoring this concern until it actually becomes a nuisance.

  3. The canonical graph is actually well-adapted to reasoning about this lightcone: if each instruction is tagged with the AQVM state at its "start", then modifications to any node in the graph will invalidate any cached information only on its (perhaps indirect) successors.

** - The best choice will be hard to compute. The heuristic I have in mind is that the best graph generated by this procedure is the largest one (which already isn't universally true), and predicting whether the inclusion of a given node will ultimately induce the graph to grow larger than the inclusion of some alternative node isn't something that seems easy to predict.

markasoftware commented 1 year ago

I believe section 5.4.2 of the paper directly addresses your point (1) @ecpeterson .

ecpeterson commented 1 year ago

So it does!