Future and upgrades to T-Complexity protocol

tanujkhattar commented 4 months ago

Right now, the TComplexity protocol is broken as described in https://github.com/quantumlib/Qualtran/pull/732

This issue is to discuss the future of TComplexity protocol since there have been informal discussions and upgrades to either deprecate it in favour of the new generalized costs framework or make improvements to it and make it better supported.

First, I'd like to highlight that the TComplexity protocol; or an equivalent LogicalCounts protocol that captures the most commonly used logical counts people care about when thinking about fault tolerant resource estimation, is extremely important and we should have an easy way (read single function call) to obtain this value for any given bloq. The TComplexity / LogicalCounts class is also the input format to invoke the microsoft azure resource estimator.

The current way to obtain this value without going via the t-complexity protocol is a) Compute sigma via bloq.call_graph() b) Call t_counts_from_sigma(sigma) assuming sigma was computed without any fancy parameters (eg: no truncation of the call graph using max_depth parameter; etc.)

We need to have a single function call (eg: bloq.t_complexity() or bloq.logical_counts()) that gives the desired counts. Inspecting the call graph should be done only when then bloq users need to get into then weeds of how exactly the logical counts were computed. Having agreed on this premise, let's lay down some requirements (I'll refer to the protocol as TComplexity for now, but we should probably change it to logical_counts in the future) -

a) The most important requirement is that the it needs to be _fast_ for the general case. Expecting users to override build_call_graph should be the recommended way but if users do not provide this decomposition, we should still be able to support fast computations of logical counts. The cirq style t_complexity uses a global cache to store and retrieve the logical counts of bloqs. This has an advantage of reusing any structure that the bloq decomposition has and ends up being sufficient for a majority of cases. This is because a lot of the times, the very high gate counts are present because we do phase estimation on a unitary U which ends up repeating U a lot of times.

b) It should report more than just T-counts and should be sufficient as an "input" to our qec overhead methods defined in bloqs/surface_code. Some other useful logical counts are - (a) Toffoli counts (b) qubit usage (c) Counts of (rotation, eps) pairs instead of just number of rotations (d) Rotation depth (azure model uses this)

Design options going forward: (a) Delegate computation of LogicalCounts to the bloq call graph -> bloq counts construction. Biggest bottleneck here is performance I believe. For example: I implemented a naive version of this approach in https://github.com/quantumlib/Qualtran/pull/732, specifically _t_counts_for_bloq called from _populate_flame_graph_data and its VERY slow. The phase_estimation_of_quantum_walk notebook is a good example of bloqs for which I want to compute the T-Complexity and plot framegraphs. I can do the former easily, but the latter can be done only for very small cases. The biggest reason for the slowdown is that I need the sigma for each node of the call graph and not just the root node. So every time I visit a node in the call graph, I do an O(n) computation to compute the sigma.

(b) We improve the cirq style t_complexity method to not compute an explicit call graph and directly, implicitly, compute only the logical counts we care about. We've already demonstrated that it's pretty fast. There are some design challenges here; for example Adjoint(subbloq) expects subbloq to accept an adjoint parameter when subbloq does not have a custom adjoint implemented but not when the subbloq provides a custom implementation of adjoint. This seems like a design smell and is confusing at best (Eg: if a Bloq A decomposes into a bloq B where B has custom adjoint but A does not then A._t_complexity_ should accept an adjoint=True but not pass on the flag to B._t_complexity_. A and B are MultiAnd and And gates right now)

cc @mpharrigan

mpharrigan commented 4 months ago

re: Adjoint: I consider this a pretty fundamental limitation of the existing t-counting protocol which subverts the decomposition hierarchy. Costs should depend on the costs of the callees.

tanujkhattar commented 4 months ago

re: Adjoint: I consider this a pretty fundamental limitation of the existing t-counting protocol which subverts the decomposition hierarchy. Costs should depend on the costs of the callees.

Well, the existing t-counting protocol says (a) "You either provide me the cost yourself" or (b) "I'll decompose you and try to figure it out myself". In the cirq land, the _InverseCompositeBloq (equivalent of Adjoint bloq) did not implement the _t_complexity_ method and therefore for an adjoint bloq, we'd always decompose and figure out the cost instead of relying on the hardcoded values. If we simply delete the implementation of Adjoint._t_complexity_, then everything will work without needing to add a new adjoint: bool parameter because the protocol will figure out the cost of the Bloq using the cost of it's callees.

The issue arises in the scenarios where we want all of the following

A way for users to hardcode the cost of a bloq by overriding _t_complexity_
A way for users to also hardcode the more complicated cases when bloq is wrapped inside a meta bloq. For example: Adjoint(bloq) and Controlled(bloq). Right now (traditionally in Cirq-FT) this case is handled by users explicitly overriding the adjoint(self) and controlled(self) methods to return custom bloqs where the hardcoded formulas have the right context and continue to work. An example is the And bloq.
The problem with MultiAnd is that we want to treat it as a "non special" Bloq and not have users override the adjoint method. But, we somehow also want to communicate the context to the protocol (MultiAnd._t_complexity_) that the Bloq is wrapped inside a Meta bloq.

Point (3) is new and needs more design. This has nothing to do with t-complexity protocol subverting the decomposition hierarchy. It's an optional optimization in step (2) which can be avoided by deleting the implementation of _t_complexity_.

mpharrigan commented 4 months ago

It's to support the case where we have the call graph callees but not a full decomposition

tanujkhattar commented 4 months ago

It's to support the case where we have the call graph callees but not a full decomposition

That case is now already supported after https://github.com/quantumlib/Qualtran/pull/740 and again is unrelated to point (3) above.

mpharrigan commented 4 months ago

It would be impossible to override _t_complexity_ on MultiAnd and have it be correct

tanujkhattar commented 4 months ago

It would be impossible to override _t_complexity_ on MultiAnd and have it be correct

The design that needs attention here is whether we want protocols like _t_complexity_ accepting in information about the meta-bloq that wraps it. In the current framework, the way to make _t_complexity_ work is either a) Return a custom MultiAnd(adjoint=True) (similar to what we do for And) so users have the context about whether the bloq is adjoint or not when overriding the _t_complexity_. b) Return a Adjoint(MultiAnd()) but Adjoint._t_complexity_ uses the decomposition and does not directly use the hardcoded subbloq._t_complexity_; because the t-complexity for sub-bloq was defined for the forward case and not the adjoint case.

We earlier used to do (a) and now we do (b) for MultiAnd; both of which are correct and are currently well supported ways. The philosophy in this situation is that if a user needs to override a custom T-complexity formula for a specific Bloq variant (like adjoint, controlled); they should return a new custom bloq by overriding the adjoint(self) and controlled(self) and the new custom bloq would have the right t-complexity formula (or other protocols) overwritten.

It seems like in your original approach you wanted to do (c) Where _t_complexity_(self) gets one or more arguments (eg: adjoint: bool, ctrl: CtrlSpec etc.) that provides the Bloq with information about the Meta-Bloq that calls it and users can use this information to hardcode gate count formulas of the Bloq for that case. This is a new and slightly weird use case and I think it's worth discussing the implications of this design in more detail.

mpharrigan commented 4 months ago

No, I don't think the bloq author has the authority to annotate a bloq with T counts unless they are direct callees. It's a leaky abstraction when you jump levels. I guess this is most similar to (b); but it begs the question why even have this method when we can never use it

tanujkhattar commented 4 months ago

It's a leaky abstraction when you jump levels.

I agree. We should not have (c)

when we can never use it

"Never use it" is a misclassification I think. We (can) use the hardcoded values for all forward cases and non-trivial adjoints / controls; which forms a large part of the codebase.

why even have this method

There are two goals I think

Providing users an API like bloq.logical_counts() which just gives them the logical counts. The ease of access of the API is worth it; since it's so commonly used.
Letting users override the logical_counts directly. The primary advantage here would be performance optimization, especially when the call graph is deep and/or expensive to construct. The secondary advantage is verification, that the analytical formula for T-counts matches the value obtained by recursively combining values from the decomposition / call graph.

mpharrigan commented 4 months ago

You'd have to guarantee that nothing in the bloq's call graph has or will ever have a custom adjoint or controlled implementation (that changes the t count)

quantumlib / Qualtran

Future and upgrades to T-Complexity protocol #735