tanujkhattar commented 1 year ago

RFC

Background

In Cirq, we apply gates on qubits to get operations, which can then be inserted into a container (Circuit) to construct a (implicit) compute graph. This pattern assumes that it’s straightforward for the users to identify the qubits on which a gate should act to yield operations.

However, as we gradually move beyond the NISQ era, this assumption faces a number of challenges listed below:

When constructing large circuits using gadgets that can allocate temporary ancillas (eg: an AND gate as in https://arxiv.org/abs/1805.03662), it often becomes very tedious for users to correctly keep track of all the ancillas needed in the system.
There are also gadgets which can borrow dirty ancillas from the rest of the system and guarantee to give them back in the same state (eg: https://arxiv.org/abs/1812.00954), which makes it hard for users to manually identify and specify the borrowable qubits on which these gadgets should act upon.
Lastly, different qubit allocation strategies for the ancilla qubits can lead to a tradeoff between qubit width and circuit depth -- for example, you could reuse the same ancillas for different operations at the cost of increased circuit depth OR you could allocate new ancillas for different operations (acting on different target qubits) and reduce the circuit depth at the cost of higher qubit count.

The goal of this design discussion is to propose a mechanism to enable cirq users to overcome the above mentioned challenges and more easily construct large circuits.

Key Observations

Qubit allocations can introduce new implicit dependencies between operations

In Cirq, two operations have a dependency in the underlying implicit compute graph if they share the same qubit.
We can categorize the dependencies between operations as explicit or implicit - an explicit dependency is one specified by the user when constructing the compute graph. These dependencies are informed only by the qubit registers which represent input / output to the algorithm. Any additional temporary workspace that needs to be allocated as part of executing the algorithm is ignored when thinking about the explicit dependencies.
Implicit dependencies, on the other hand, are dependencies that are introduced when routing a logical compute graph on a specific set of qubits, thus making concrete choices about which qubits should be allocated for the temporary workspace.
Making decisions about the qubit allocations for implicit dependencies becomes much easier if we have the entire logical compute graph to inspect. However, if we add an abstraction to Cirq that allocates the final qubits at the time of circuit construction, we are instead forced to make this decision while computing the logical compute graph. This makes it much harder to provide a flexible framework which can deal with the different space-time tradeoffs.

Borrowing qubits cannot be supported during circuit construction

Some quantum algorithms can use qubits without relying on their exact state, and without requiring that they are unentangled with the rest of the system. If there are qubits that are already allocated but not touched during parts of a subroutine, those qubits can be borrowed for use by such an algorithm instead of allocating additional quantum memory (eg: https://arxiv.org/abs/1812.00954)
Note that when borrowing qubits, one needs to make sure that the in-use qubits that are borrowed are guaranteed not to be used by the program from the time the qubit is borrowed until the last use of that qubit.
This implies that a qubit allocator cannot decide which qubits can be borrowed till it knows when / where the borrowed qubits get freed. As a result, a qubit allocator cannot return an already allocated qubit when the user requests it to “borrow”’ a qubit.
Thus, to support the borrowing operation, we would likely need to add a post-process transformer which can replace additional “borrowable” qubits allocated during circuit construction with actual system qubits and thus reduce the circuit width.

Cirq fails if a gate decomposes into an OP_TREE with operations on newly allocated ancilla qubits.

Cirq gates need to define the number of qubits that they act upon by exposing _num_qubits_.
Ideally, we'd like to define composite gates/operations that can implicitly allocate ancillas as part of their _decompose_ protocol to achieve the desired effect without exposing all the gory details.
This pattern currently leads to multiple failures with simulators.

Design Details

Using the key observations above, the proposed high level design is as follows:

Circuit construction

Implement global qalloc, qborrow and qfree methods which delegate calls to a QubitManager. The choice of which qubit manager to delegate to can be controlled by a context manager.
- The default prescription would be to use a SimpleQubitManager + post-processing transformers but the framework is flexible enough to directly support other heuristic qubit allocation strategies at the time of circuit construction itself.
By default, we use a SimpleQubitManager that always allocates a new qubit for each qalloc / qborrow call. This ensures that no implicit data dependencies are introduced at the time of circuit construction due to qubit allocation. In other words, the simple qubit manager uses an allocation strategy that maximises qubit width and minimized circuit depth.
- In order to be able to easily identify which qubits were allocated by the SimpleQubitManager, we introduce two new internal qubit types - CleanQubit and BorrowableQubit.
Provide implementations of post-process transformers which inspect circuits constructed above, containing CleanQubit/BorrowableQubits, and perform a smarter qubit allocation. At this step, different post processing transformers can use different qubit width / circuit depth tradeoff strategies and also inspect the compute graph to automatically support borrowing qubits.

Circuit Simulations

Suppose we construct a circuit that contains an operation defined using it's _decompose_ protocol and that uses additional ancillas invoked using qalloc / qfree. The simulators would raise an error right now if we try to simulate such an operation because memory for a simulation is allocated in advance by inspecting cirq.num_qubits(op), and as a consequence the simulators expect an operation to decompose into an OP_TREE acting only a subset of op.qubits.

To mitigate this issue, one easy option is to recursively decompose the input circuit till we get to a point where the resulting circuit is a "trivial" circuit, i.e. has no more implicit allocations / deallocations inside the decompose protocol of operations. We can introduce additional protocols to keep track of which operation is "trivial" or not so we only decompose the operations that have a hidden allocation.

Another option that comes to mind, but I'm not fully sure if we can implement efficiently in the existing cirq simulator infrastructure, is to come up with a way of computing the effect of a "non-trivial" operation (one that has hidden allocations) only on the qubits that it expects as input (for example, by computing a reduced density matrix). This should be possible because there's an implicit promise that the allocated ancillas should be cleaned up when the user calls qfree(q) and therefore an outer simulator should not have to care about them. cc @daxfohl I'm curious to hear your thoughts on this.

Example code snippets

Simple Qubit Manager with post-process transformer

```python class GateAllocAndBorrowInDecompose(cirq.Gate): def __init__(self, num_alloc: int = 1): self.num_alloc = num_alloc def _num_qubits_(self) -> int: return 1 def __str__(self) -> str: return 'TestGate' def _decompose_(self, qubits): qa, qb = cqa.qalloc(self.num_alloc), cqa.qborrow(self.num_alloc) for q, b in zip(qa, qb): yield cirq.CSWAP(qubits[0], q, b) yield cirq.qft(*qb).controlled_by(qubits[0]) for q, b in zip(qa, qb): yield cirq.CSWAP(qubits[0], q, b) cqa.qfree(qa + qb) def test_decompose_and_allocate_qubits(): with cqa.memory_management_context(): op = GateAllocAndBorrowInDecompose(3).on(cirq.NamedQubit("original")) extra = cirq.LineQubit.range(3) circuit = cirq.Circuit(cirq.H.on_each(*extra), cirq.Moment(op), cirq.decompose_once(op)) cirq.testing.assert_has_diagram( circuit, """ _b0: ───────────────────────×───────────qft───×─────────── │ │ │ _b1: ───────────────────────┼───×───────#2────┼───×─────── │ │ │ │ │ _b2: ───────────────────────┼───┼───×───#3────┼───┼───×─── │ │ │ │ │ │ │ _c0: ───────────────────────×───┼───┼───┼─────×───┼───┼─── │ │ │ │ │ │ │ _c1: ───────────────────────┼───×───┼───┼─────┼───×───┼─── │ │ │ │ │ │ │ _c2: ───────────────────────┼───┼───×───┼─────┼───┼───×─── │ │ │ │ │ │ │ 0: ──────────H──────────────┼───┼───┼───┼─────┼───┼───┼─── │ │ │ │ │ │ │ 1: ──────────H──────────────┼───┼───┼───┼─────┼───┼───┼─── │ │ │ │ │ │ │ 2: ──────────H──────────────┼───┼───┼───┼─────┼───┼───┼─── │ │ │ │ │ │ │ original: ───────TestGate───@───@───@───@─────@───@───@─── """, ) allocated_circuit = cqa.decompose_and_allocate_qubits(circuit, decompose=lambda op: op) cirq.testing.assert_has_diagram( allocated_circuit, """ 0: ───────────H──────────────×───────────qft───×─────────── │ │ │ 1: ───────────H──────────────┼───×───────#2────┼───×─────── │ │ │ │ │ 2: ───────────H──────────────┼───┼───×───#3────┼───┼───×─── │ │ │ │ │ │ │ ancilla_0: ──────────────────×───┼───┼───┼─────×───┼───┼─── │ │ │ │ │ │ │ ancilla_1: ──────────────────┼───×───┼───┼─────┼───×───┼─── │ │ │ │ │ │ │ ancilla_2: ──────────────────┼───┼───×───┼─────┼───┼───×─── │ │ │ │ │ │ │ original: ────────TestGate───@───@───@───@─────@───@───@─── """, ) def decompose_func(op): return ( cirq.decompose_once(op) if isinstance(op.gate, GateAllocAndBorrowInDecompose) else op ) allocated_and_decomposed_circuit = cqa.decompose_and_allocate_qubits( circuit, decompose=decompose_func ) cirq.testing.assert_has_diagram( allocated_and_decomposed_circuit, """ 0: ───────────H───×───────────qft───×───────────×───────────qft───×─────────── │ │ │ │ │ │ 1: ───────────H───┼───×───────#2────┼───×───────┼───×───────#2────┼───×─────── │ │ │ │ │ │ │ │ │ │ 2: ───────────H───┼───┼───×───#3────┼───┼───×───┼───┼───×───#3────┼───┼───×─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ ancilla_0: ───────×───┼───┼───┼─────×───┼───┼───×───┼───┼───┼─────×───┼───┼─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ ancilla_1: ───────┼───×───┼───┼─────┼───×───┼───┼───×───┼───┼─────┼───×───┼─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ ancilla_2: ───────┼───┼───×───┼─────┼───┼───×───┼───┼───×───┼─────┼───┼───×─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ original: ────────@───@───@───@─────@───@───@───@───@───@───@─────@───@───@─── """, ) # Since we preserve moment structure, if TestGate is in the first moment then # we end up allocating 6 ancilla to not introduce any additional data dependency # due to our allocation strategy and thus not impact the circuit depth. allocated_and_decomposed_circuit = cqa.decompose_and_allocate_qubits( cirq.align_left(circuit), decompose=decompose_func ) cirq.testing.assert_has_diagram( allocated_and_decomposed_circuit, """ 0: ───────────H─────────────────────────────×───────────qft───×─────────── │ │ │ 1: ───────────H─────────────────────────────┼───×───────#2────┼───×─────── │ │ │ │ │ 2: ───────────H─────────────────────────────┼───┼───×───#3────┼───┼───×─── │ │ │ │ │ │ │ ancilla_0: ───×───────────qft───×───────────×───┼───┼───┼─────×───┼───┼─── │ │ │ │ │ │ │ │ │ │ ancilla_1: ───┼───×───────#2────┼───×───────┼───×───┼───┼─────┼───×───┼─── │ │ │ │ │ │ │ │ │ │ │ │ ancilla_2: ───┼───┼───×───#3────┼───┼───×───┼───┼───×───┼─────┼───┼───×─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ ancilla_3: ───×───┼───┼───┼─────×───┼───┼───┼───┼───┼───┼─────┼───┼───┼─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ ancilla_4: ───┼───×───┼───┼─────┼───×───┼───┼───┼───┼───┼─────┼───┼───┼─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ ancilla_5: ───┼───┼───×───┼─────┼───┼───×───┼───┼───┼───┼─────┼───┼───┼─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ original: ────@───@───@───@─────@───@───@───@───@───@───@─────@───@───@─── """, ) ```

Greedy Qubit Manager

```python class GateAllocInDecompose(cirq.Gate): def __init__(self, num_alloc: int = 1): self.num_alloc = num_alloc def _num_qubits_(self) -> int: return 1 def _decompose_(self, qubits): for q in cqa.qalloc(self.num_alloc): yield cirq.CNOT(qubits[0], q) cqa.qfree([q]) def test_greedy_qubit_manager(): def make_circuit(): q = cirq.LineQubit.range(2) g = GateAllocInDecompose(1) circuit = cirq.Circuit(cirq.decompose_once(g.on(q[0])), cirq.decompose_once(g.on(q[1]))) return circuit with cqa.memory_management_context(cqa.GreedyQubitManager(prefix="ancilla", size=1)): # Qubit manager with only 1 managed qubit. Will always repeat the same qubit. circuit = make_circuit() cirq.testing.assert_has_diagram( circuit, """ 0: ───────────@─────── │ 1: ───────────┼───@─── │ │ ancilla_0: ───X───X─── """, ) with cqa.memory_management_context(cqa.GreedyQubitManager(prefix="ancilla", size=2)): # Qubit manager with 2 managed qubits and parallelize=True, tries to minimize adding additional # data dependencies by minimizing reuse. circuit = make_circuit() cirq.testing.assert_has_diagram( circuit, """ ┌──┐ 0: ────────────@───── │ 1: ────────────┼@──── ││ ancilla_0: ────X┼──── │ ancilla_1: ─────X──── └──┘ """, ) with cqa.memory_management_context( cqa.GreedyQubitManager(prefix="ancilla", size=2, parallelize=False) ): # Qubit manager with 2 managed qubits and parallelize=False, tries to minimize reuse by potentially # adding new data dependencies. circuit = make_circuit() cirq.testing.assert_has_diagram( circuit, """ 0: ───────────@─────── │ 1: ───────────┼───@─── │ │ ancilla_1: ───X───X─── """, ) ```

Conclusion

I'd love to hear thoughts of other maintainers and contributors, specifically @daxfohl @dstrain115 @dabacon @maffoo. Once there is consensus, I'll create smaller subtasks which can be taken up by 20%ers and other community contributors.

@NoureldinYosri and @mpharrigan helped with this exploration and have given a round of feedback offline.

Update - 3rd April 2023

See the comments for more discussions on maintaining a global state for qalloc / qborrow vs passing around the qubit_manager as part of the decompose protocol to cirq.Gate and cirq.Operation. Right now, we have a working prototype without maintaining a global state. Here, we list down all the subtasks for implementing this roadmap item, many of which can be worked upon in parallel.

[x] Add a QubitManager interface and a SimpleQubitManager implementation, which always allocates new CleanQubits/BorrowableQubits for each qalloc/qborrow request.
[x] Add a new _decompose_with_qubit_manager_ protocol for both cirq.Gate and cirq.Operation.
[x] Update cirq.decompose and it's variants to first try _decompose_with_qubit_manager_ and fallback on _decompose.
[x] Update cirq protocols to work for cases where an operation allocates new qubits as part of it's decomposition (https://github.com/quantumlib/Cirq/issues/6101)
[x] Update cirq simulators to work for cases where an operation allocates new qubits as part of it's decomposition.
[x] Add a GreedyQubitManager that maximizes/minimizes qubit reuse based on a configurable parameter.
[x] Add a post-processing transformer that maps all CleanQubit/BorrowQubit in a circuit to system qubits using a qubit manager.

shef4 commented 1 year ago

I'm interested in helping out on this project if possible.

95-martin-orion commented 1 year ago

This project reminds me of #4100 and related PRs, which allow simulators to perform implicit qubit (de-)allocation when operating on definite qubits. It seems that the primary features being proposed here are:

The ability to allocate arbitrary qubits outside of simulation, such that a gate can request (q0_1, ancilla) and be assigned an appropriate qubit based on availability (and maybe architecture?).
Various QubitManagers to encapsulate different strategies for allocation.

If I understand correctly, none of this is meant to affect existing, definite-qubit circuits - only new circuits which make use of allocated qubits will be affected. Is this an accurate summary of the proposal?

(Apologies if this is just restating what you've already said - writing it out mostly helped me make sense of it myself.)

tanujkhattar commented 1 year ago

If I understand correctly, none of this is meant to affect existing, definite-qubit circuits - only new circuits which make use of allocated qubits will be affected. Is this an accurate summary of the proposal?

That is absolutely correct! The goal is to provide a framework for users to easily construct circuits composed of composite operations (aka gadgets) that can use additional ancillas "under the hood" (i.e. as part of their decompose protocol).

daxfohl commented 1 year ago

I'm not quite clear, do the allocations occur at runtime, or is there a transformer by which you take a circuit definition that has ancilla-using ops, and transforms that into a fully-defined circuit that you can run?

Knee-jerk reaction is that the latter would be preferable, since the former could lead to space errors halfway through simulation, whereas the latter you know exactly how many qubits you need before simulating. The former may be useful if we have nondeterministic circuits (in which case routing the ancillas in advance may be impossible), but we don't currently have that.

Aside, it may be useful to have a PC memory requirement calculator for simulation, since we should be able to determine the maximum entanglement for a circuit prior to running it.

One interesting use case would be the deferred measurements transformer. Would it be possible to rewrite this in terms of ancillas? Hmm, actually maybe not. The qubits added in that algorithm end up getting measured at the end, so aren't true ancillas, so never mind.

daxfohl commented 1 year ago

nondeterministic circuits

I'm an idiot. I spent six months working on adding nondeterministic circuits to cirq....

In particular, CircuitOp.repeat_until is completely nondeterministic. It can't be unrolled, because it could go forever. It also can't be turned into a deferred measurement. So ostensibly a runtime allocator would be useful there, even though there's no way to guarantee it won't OOM.

Alternatively you could specify that a dynamic qubit allocation within a subcircuit has to be deallocated within that same subcircuit, avoiding the OOM potential. Then that would also then become deterministic (in terms of circuit layout), as each repetition could reuse the same ancilla, even if the number of repetitions is nondeterministic. So could be done at circuit building time rather than runtime.

So yeah, I think focusing on the use case where the tooling transforms a circuit with ancillas to a fully-defined circuit is the more applicable for Cirq. Runtime allocation is more useful if you really want to move toward Turing-completeness with the classical circuit model, like the Q# approach. But right now Cirq doesn't have enough classical logic primitives to make that worthwhile.

NoureldinYosri commented 1 year ago

@daxfohl It's indeed the latter with the intention to implement different strategies that optimize for different properties (e.g. circuit width, depth ..)

As for nondeterministic circuits. if these circuits/subcircuits gaurentee that the qubits will be clean (returned to their initial state) at the end then they can be created using BorrowableQubits. this way the number of qubits will be constant without imposing any restriction on the circuit (e.g. forcing them to manually deallocate ancillas)

Correction: the type that will always be gauranteed to be clean is CleanQubit not BorrowableQubit

95-martin-orion commented 1 year ago

Potential use case: part of the reason we have CircuitOperation instead of CircuitGate is because at the time Cirq had no concept of ancilla / placeholder qubits. With this change, we could plausibly define a CircuitGate and thus ensure that all Operations are GateOperations.

...then again, that would be a pretty big change for a relatively minor improvement. Certainly not a high-priority task to pursue, if indeed we decided to pursue it at all.

dstrain115 commented 1 year ago

I am concerned that the QubitManager is a global state. In general, global state is generally bad practice, as it is tricky to make statements or guarantees about such objects. Other code unrelated to you can change it.

Using a context manager is much better, but then this has its own gotchas if two separate pieces of code

I also think that kwargs should be passed to the QubitManager so that you can potentially adjust behavior based on custom parameters. (Most obviously, you may want to label the ancilla qubits in some way). While resource estimation might not care about this 'temporary' state, I would be hesitant to say that no users care about how this works. If nothing else, it would be good for debugging if you get some weird ancillae that you were not expecting and want to know where they came from.

Other than these two concerns, I like it. It's exciting to see this!

daxfohl commented 1 year ago

+1 to the above. I think a good number (all?) of cirq protocols could allow kwargs for easier customization. If decompose accepted kwargs, it would be possible to do a cirq.decompose(op, qubit_manager=my_qubit_manager) directly here without making any underlying code changes at all.

tanujkhattar commented 1 year ago

@daxfohl I updated the original text with some more context around how simulators would be affected by this change. Can you please take a look and share your thoughts?

daxfohl commented 1 year ago

So you mean doing the allocations at runtime rather than compile time?

Should be doable. First have to undo #5748, and re-add the SimulationState.with_qubits method (have to figure out how to make mypy happy), which will give you the ability to add ancillas at runtime. Then you have to add a similar SimulationState.without_qubits that uses self.factor in the same way, to remove ancillas at runtime. (Note you probably want to make those mutating since this is going to be directly updating the simulator state; the with_qubits implementation was pure IIRC.)

Then in SimulationState.strat_act_on_from_apply_decompose, you'll need to add some way of identifying all the ancillas created in the _try_decompose_into_operations_and_qubits step, which might be challenging because that function is weird (it always returns LineQubits and you have to map them back into the actual qubits the operation uses). I'd think you could assume in line 99 that all "extras" in qubits1 are the ancillas, and so then you could create new "UuidQubit" (or something, to avoid collisions with existing qubits) for each on the spot. Then, you use the above functions you created to add the ancillas to the state, iterate through the decomposed ops, then remove the ancillas before returning (optionally calling reset on each of them first to ensure they factor cleanly).

This would generically give you the ability to do the allocations at runtime in all simulators that support kron and factor. (Density Matrix and State Vector do; I couldn't figure out a factor algo for Clifford, and would bet that if one exists it is exponential in the general case (though it might be possible to have measure/reset force it to an easily factorizable representation); factor for MPS should be trivial but I never got around to it).

The hard part may be getting the QubitManager in there somehow. First draft I guess you can use a global config. But I'll go back to my above comment and say all the protocols and handlers should be required to support and pass through **kwargs. Then you could just pass it through act_on for free.

daxfohl commented 1 year ago

I figured I'd try it out. https://github.com/daxfohl/Cirq/commit/548e61f363deb97668076c9b696b48aa19577c28

Here we make an ancilla'd X gate by X'ing a new ancilla, CX'ing from the ancilla to the target, then un-X'ing the ancilla to get it back to zero. Seems to work equivalently to X gate for state vector and density matrix simulators.

tanujkhattar commented 1 year ago

One interesting use case would be the deferred measurements transformer. Would it be possible to rewrite this in terms of ancillas? Hmm, actually maybe not. The qubits added in that algorithm end up getting measured at the end, so aren't true ancillas, so never mind

@daxfohl I think it should be possible to use the new proposed qalloc to replace the manual allocation of ancillas in deferred measurement transformer here: https://github.com/quantumlib/Cirq/blob/f1149962b33d15d8bdbb3d00a6a8e453924d79da/cirq-core/cirq/transformers/measurement_transformers.py#L107

But I'm not sure if this is what you had in mind when you say "Would it be possible to rewrite this in terms of ancillas? ". In general, qalloc / qfree are convenience methods for users to not have to manually keep track of newly allocated qubits when they don't need to (which is kind of what's happening already in deferred measurement transformer).

Potential use case: part of the reason we have CircuitOperation instead of CircuitGate is because at the time Cirq had no concept of ancilla / placeholder qubits.

@95-martin-orion Could you elaborate more? The newly added ancilla / placeholder qubit types are still valid cirq qubits (they derive from cirq.Qid). It's not clear to me how having these new types enables us to have a CircuitGate ?

daxfohl commented 1 year ago

I'm not sure where I was going. Both things kind of create qubits out of thin air, so it seemed maybe there was some overlap that could be wheedled out. But I don't think so. Ancillas have to be freed at the end of the operation. "MeasurementQids" from deferred measurements have to stay around until the end of the circuit. So it's not quite the same thing.

95-martin-orion commented 1 year ago

Potential use case: part of the reason we have CircuitOperation instead of CircuitGate is because at the time Cirq had no concept of ancilla / placeholder qubits.

@95-martin-orion Could you elaborate more? The newly added ancilla / placeholder qubit types are still valid cirq qubits (they derive from cirq.Qid). It's not clear to me how having these new types enables us to have a CircuitGate ?

[Disclaimer: the only benefit of this is to (potentially) make the code cleaner. Net value added is likely small.]

Some of the details are lost to history, but I have a couple of docs on the original CircuitGate plan that I've shared with you out-of-band. The surface-level issue I'm remembering is that CircuitGate needed to provide qubits to its subcircuit, but Gate has no qubits and gluing "real" qubits to a Gate was essentially an Operation, even if they were intended as placeholder qubits.

To take this one step further: we could have all Gates have ancilla qubits to represent the qubits they act on. This would make them more like functions - Gate.on would directly map real qubits to placeholders, instead of the nebulous behavior we have today (e.g. CZ and the extended "control/target" discussion)

daxfohl commented 1 year ago

Oh, I know what I was thinking. Related to the add_qubits in the POC I linked above. If we have that way to dynamically add qubits in-situ during a simulation, it'd be possible to do the deferred measurements at runtime rather than ahead of time. Previously we had a perform_measurements option for density matrix simulator that would replace measurement gates with phase damp gates. But with the addition of CCOs that became insufficient and had to be removed in favor of the deferred measurement transformer.

But if we had the ability to add qubits during the simulation, SimulationState could have a _deferred_measurement mode, which, in SimulationState.measure would dynamically add a MeasurementQid instead of actually measuring the qubit. The only problem then is that the logic for checking the classical state for gating classically controlled ops is in the CCO itself, so it can't be overridden in the SimulationState class. However it should be straightforward to move that code to SimulationState. Then once that's done you get a runtime deferred measurement transform.

Update: POC of basic functionality https://github.com/quantumlib/Cirq/commit/f7de78d3b851e5aabe767458dc256248d0a9f7e6.

Is it pretty? Not really, but it's nice that it works for multiple simulators even though only coded once.
Does it work? Yes, though it would break product mode, but that could be patched too.
Is it worth it? Probably not, unless there's some compelling reason.
Other comments? It does require duplicate logic there and in the AOT transformer. That could be fixed by creating a LoggingSimulationState that simply logs every operation it has seen, which the AOT transformer could use to generate the transformed circuit. But I digress....

tanujkhattar commented 1 year ago

I am concerned that the QubitManager is a global state. In general, global state is generally bad practice, as it is tricky to make statements or guarantees about such objects. Other code unrelated to you can change it.

@dstrain115 I agree that it would have been nicer if we didn't need to maintain a global state and could pass around qubit manager everywhere. I thought hard about potential ways of passing around qubit manager and I concluded that it's best to go with a global state given the tradeoffs. Let me provide some more context as follows.

We need `qalloc`/`qfree` inside `_decompose_` method of a `Gate`

We need to sneak in our qalloc / qfree methods inside the _decompose_ method of a Gate / Operation. There are a few ways to make this happen without having global methods

Way-1: Update the signature of `_decompose_` protocol to accept `qubit_manager` argument.

This would be backwards incompatible and be a huge breaking change.
Specifically, even if you add an argument with a default of None, how do you go and update all the existing user code that define a def _decompose_(self, qubits)?
This also has second order backwards incompatible effects because cirq.decompose_once() protocol would need to be updated to call gate._decompose_(qubits, qubit_manager); but not all gates would support a qubit_manager as mentioned in the above point.
Another second order effect that is that we'll have to update the interface of intercepting_decomposer and fallback_decomposer since the OpDecomposer definition would now also need to include a parameter for the qubit manager.

Way-2: Introduce a new `_decompose_with_qubit_manager_` protocol that accepts a `qubit_manager` argument.

This would potentially solve the backwards incompatibility problems of Way-1 but still expect all new and existing code to use cirq.decompose_wiith_qubit_manager instead of cirq.decompose as the default decomposition protocol, which would be very cumbersome to update.
Another drawback here is that existing code that uses cirq.decompose would not automatically be compatible with newly written operations that use newly allocated ancillas in their decomposition

We need a qubit allocation strategy at circuit level scope or above.

Ideally, the qubit management should happen at the scope of a circuit. A "global" scope is probably too large but an operation level scope is too restrictive because in Cirq, gates and operations can exist without a circuit.
- This is why, for example, expecting a qubit manager as a qubit source instead of qubits in op = gate.on(qubit_manager) is not ideal because operations can exist without caring about a qubit allocation strategy for a specific circuit.
The way this is currently handled in the current design is that the context manager is used to define a local scope (which is broader than circuit scope but should ideally be the same) and a default SimpleQubitManager + post-process transformer is the recommended strategy where SimpleQubitManager ensures that the compute graph is captured without adding additional dependencies and a post-process transformer ensures that the actual greedy allocation strategy is followed at the scope of the circuit.

daxfohl commented 1 year ago

It's possible to use inspect.signature and only pass in qubit_manager when the target method accepts it. Here is an example of where that was done for backwards compatibility. https://github.com/quantumlib/Cirq/pull/4789/files#diff-4e026f4892ce7bbefa6a342ec72c69a62dde54695e5dcbbe5171a2aeb5c73e5f

tanujkhattar commented 1 year ago

Okay, I have a working prototype that does not require a global state of qalloc / qfree and instead uses a new _decompose_with_qubit_manager_ protocol. I updated cirq.decompose and it's variants accordingly to first try decomposition with a qubit manager and fallback on regular decompose .

FYI @daxfohl @dstrain115, please see https://github.com/tanujkhattar/Cirq/commit/f05f7d7d204705f35588e2074a9b48772df5ff48 and let me know if you have any initial feedback. In terms of next steps, I'll mark the issue as accepted and create subtasks assuming we add a new decompose protocol and don't maintain a global state. If we discover any further issues with using a new decompose protocol, we can always fallback the original global state approach.

NoureldinYosri commented 1 year ago

The problem recognized above is cirq.decompose protocol for both its importance and the way it works.

In addition to supporting a qubit manager there are other related considerations.

The goal question. Why do we decompose? Maybe we want the decomposition that gives the best cost of a specific gateset (e.g. be able to do something like g._decompose_(goal = T_COUNT) or g._decompose_(goal = Toffoli_COUNT)
The what question. e.g. g._decompose_(model = 'arxiv:1234')

I think instead instead of _decompose_with_qubit_manager_ we need something like _decompose_with_options_(self, qubits, qubit_manager: cirq.QubitManager, cost_model: cirq.SupportedCostModels = cirq.SupportedCostModels.T_COUNT, from_model: bool = True) (function name is just a suggestion) that uses a qubit_manager for ancilla allocation, and returns the decomposition from model (e.g. paper) if a model is supplied or a decomposition that minimizes the cost of the given target cost_model.

This would work the same as in https://github.com/quantumlib/Cirq/issues/6040#issuecomment-1493658988 but will allow us to support different cost models or experimenting with novel decompositions.

The model should be supplied in the constructor and can be either an OP_TREE generator that accepts a qubit_manager or an enum value (e.g. a paper number)

SomeGate() # default None
SomeGate(model=SomeGate.Models.ARXIV_1234)
SomeGate(model=some_OP_TREE)

prototype implementation

```py3 class SupportedCostModels(Enum): T_COUNT: int = 0 TOFFOLI_COUNT: int = 1 #... etc class SomeGate: class Models(cirq.GateModels): DEFAULT: int = 0 ARXIV_123: int = 1 ARXIV_456: int = 2 @classmethod def select(cls, model=None, cost_model=None) -> DecomposerFunc[[Sequence[qubits], cirq.QubitManager], cirq.OP_TREE]: """Returns a decomposer matching either requested gate model or one that optimizes the given cost_model."""" def _decompose_with_options_(self, qubits, qubit_manager: cirq.QubitManager, cost_model: cirq.SupportedCostModels = cirq.SupportedCostModels.T_COUNT, from_model: bool = True): if from_model and self.model_func is not None: # A supplied model is preferred yield from self.model_func(qubits, qubit_manager) return # yield the model that optimizes the given cost target. yield from Models.select(cost_model=cost_model)(qubits, qubit_manager) ```

daxfohl commented 1 year ago

Largely agree with @NoureldinYosri above. "Why we decompose", is something I still have a hard time with after two years on the project, and I know end users ask about this too. I think Cirq team needs to take the opportunity to step back and come up with a holistic answer there.

One extra note, should GateSets perhaps be involved? I'd think you want to decompose to a gate set somehow. Or maybe decomposition should be external to the gate itself, since there are always multiple ways to decompose a gate. I don't have an answer, but I think there needs to be something more unified rather than yet another monkey patch regarding decomposition.

NoureldinYosri commented 1 year ago

@daxfohl

One extra note, should GateSets perhaps be involved? I'd think you want to decompose to a gate set somehow. Or maybe decomposition should be external to the gate itself, since there are always multiple ways to decompose a gate. I don't have an answer, but I think there needs to be something more unified rather than yet another monkey patch regarding decomposition.

This is why I'm introducing these ways of instantiating a gate

SomeGate(model=SomeGate.Models.ARXIV_1234)
SomeGate(model=some_OP_TREE_generator)

so for example if a decomposition targeting a specific gateset was introduced in an a paper say ARXIV_XXX. Then either we have it and the user can simply create an instance as SomeGate(model=SomeGate.Models.ARXIV_XXX). or if we don't have it then they can supply it to constructor and expect that everytime that gate is decomposed their provided decomposition will be used.

def custom_decomposer(qubits, qubit_manager) -> cirq.OP_TREE:
     """yields and custom decomposition"""

SomeGate(model=custom_decomposer)

tanujkhattar commented 1 year ago

Or maybe decomposition should be external to the gate itself

I have explored this in the past and I think this would be a huge breaking change for Cirq IMO. Qiskit supports doing decompositions external to the gate itself by maintaining an EquivalenceLibrary, but the reason it works is that they have a very restricted set of types that can be "parameters" to a gate/operation. And for each (gate_name, num_qubits); you add an equivalence from (gate_parameters -> decomposed_circuit).

For us, a gate can be any class that can accept any arbitrary parameters (including other parameterized gates; eg: for ControlledGate). And, decomposition of gate can depend upon arbitrary conditions on these parameters. Therefore, capturing this complexity externally for all gates and for all ranges of parameters for those gates would mean that we expose an API like the intercepting_decomposer / fallback_decomposer that take a gate/operation instance and maybe a bunch of options (like qubit manager) and return an appropriate decomposition. However, this has the risk that everytime we update the parameters of a gate; the such op decomposer implementations external to the gate class would also need to be updated.

Note that even if we make such an OpDecomposer library a more formal concept, it can be passed directly to the existing cire.decompose protocol as an intercepting_decomposer and therefore I'm not so sure if we really gain much.

The idea of the decompose protocol is to provide a way to define the action of a composite gate in terms of other "simpler" gates, by potentially allocating new ancilla qubits as part of defining this action. For the purpose of this issue, I don't think we need to generalize this even more to the point where we can pick an "optimal" decomposition depending on the use case -- this is something that can be achieved by passing an intercepting_decomposer instance as well. Although I agree that having a more structured way to achieve this would be useful.

I propose that we open a separate issue to brainstorm the generalization of decompose protocol and discuss "How to pick an optimal decomposition for a gate". For the sake of this issue, I'd like to keep the discussion focussed on "How can I define the action of a composite gate in terms of other simpler / known gates; by potentially using additional ancilla qubits?".

What do you guys think? @NoureldinYosri @daxfohl

NoureldinYosri commented 1 year ago

@tanujkhattar on the one hand I don't want to widen the scope of this issue. however I do believe that the discussion around the decompose protocol is important here due to the following reasons

The decompose protocol is central to this issue and is the main reason we have to use a context manager so that on the one hand we don't break the protocol and on the other have a way of injecting the qubit manager into the logic of _decompose_ method and the decompose protocol. In other words we are dancing around two problems: trying to avoid globals and trying not break the decompose protocol
https://github.com/quantumlib/Cirq/issues/6040#issuecomment-1493658988 patches the protocol with _decompose_with_qubit_manager_ and more importantly an introspection method (inspect.signature) and personally I'm not a fan of introspection methods since they always point to a design issue.
I had to create a new flavour of the decompose protocol to support one complexity class.
Even with my implementation we still have no way to support other complexity classes without either breaking cirq.decompose or patching the one I created.
The intercepting decomposer to me feel like a batch in and of itself and has its own short comings (e.g. we can't mix different decomposition of the same gate .. can't do something like g1, g2 = Gate(model='arxiv:123'), Gate(model='arxiv:456') where g1, g2 are instances of the same gate but with different decompositions)

Thus as @daxfohl mentioned we probably should use this as an opportunity to rethink the decompose protocol to make our lives easier for the goals dependent on it:

supporting qubit management
doing resource estimation
support defining per gateset decompositions.
supporting experimentaion with different decompositions.

daxfohl commented 1 year ago

I don't have any additional thoughts on decompose, I think you two have covered it well.

@tanujkhattar I just looked at your proposed solution and it does bring up an interesting question. For the unitary protocol, there's really no way to identify which qubits are ancillas in order to remove them afterward. But they probably should be extracted. It's straightforward to do the linalg if you know which axes to extract. But identifying those axes, no idea.

I also don't know the theory, whether gates with ancillas are guaranteed to be unitary in the non-ancilla qubits. Seems like they should be, but if that's the case then why are they necessary?

tanujkhattar commented 1 year ago

@daxfohl

For the unitary protocol, there's really no way to identify which qubits are ancillas in order to remove them afterward. But they probably should be extracted. It's straightforward to do the linalg if you know which axes to extract. But identifying those axes, no idea.

See my sample implementation in https://github.com/quantumlib/Cirq/issues/6101. In general, for operations we know which qubits they act on before calling cirq.decompose(op) and for gates, we explicitly choose a set of qubits and pass them to the decompose protocol. Any additional qubits in the decomposed OP_TREE are ancillas and we should assume that they are cleaned up within the decompose protocol. We only support allocating ancillas which should be cleaned up within the decompose protocol -- this is slightly restrictive but makes everything much easier.

daxfohl commented 1 year ago

It works, but I can't help but feel this is something that should be done at a lower level. Otherwise there's going to be logic removing ancillas scattered around lots of different places in the codebase.

It may be hard to change the low level code in a backwards compatible way though. Perhaps defer to 2.0 and introduce some breaking changes that allow doing things more cleanly. I think there's a reasonable argument to be made that strict adherence to backwards compatibility has slowed down dev velocity and made the user experience suboptimal. Planning some breaking changes for a 2.0 cleanup release may be a good kickstart to the product.

tanujkhattar commented 1 year ago

It works, but I can't help but feel this is something that should be done at a lower level. Otherwise there's going to be logic removing ancillas scattered around lots of different places in the codebase.

Can you give an example of what do you mean by "lower level" ?

viathor commented 1 year ago

Borrowing qubits is entirely fine in the ideal world of unitary operations. On NISQ devices all operations are actually noisy quantum channels. For example, CNOT12, Z2, CNOT12, Z2 implements the Z1 gate and restores the second qubit to whatever state it was before it was borrowed for the operation. However, when we replace the four gates with their noisy equivalents this is no longer true.

That said, this feature can be useful to make it easier and more automated to make an effective use of a quantum processor. I'd just keep in mind that this is an idealization. Therefore, the user should be made aware of the decisions made by memory management and should be able to affect them and override them. IOW, memory management for NISQ devices should be very much unlike the memory management on a classical computer which hides details from the programmer.

For example, there is a trade-off is between borrowing otherwise inactive qubits and applying dynamical decoupling to them. This should be something the programmer gets to decide.

We only support allocating ancillas which should be cleaned up within the decompose protocol -- this is slightly restrictive but makes everything much easier.

See this comment for some ideas on how to test this. There I assumed we may want to keep decompositions that don't return auxiliary qubits to the initial state, but I agree with you that restricting to just those that do (and hence applying the tests proposed in the linked comment to all _decompose_ functions) is simpler and makes a lot of sense.

daxfohl commented 1 year ago

@tanujkhattar I don't have an example of "lower level", just thinking that the ancilla handling needs to be somewhere such that you don't have to explicitly deal with it in all the various protocols that work on operations and gates. But it shouldn't be hidden away completely; there are situations where you might need to explicitly handle the ancillas. Judging by @viathor's comment, those situations may be more common than not in actual quantum research.

Anyway that's all I was trying to convey.

shef4 commented 1 year ago

I'm really interested in helping with the QubitManager interface if it still needs someone to work on it.

Questions:

Are there any resources and demo code I should fork to start with?
does Cirq follow a specific interface style or would it be OK to compare how other languages like Qiskit and Q# approach it?

NoureldinYosri commented 1 year ago

hey @shef4 the qubit manager has a couple of tasks remaining for it

most protocols don't support it. We added support for it in a couple of them (see. #6112 and #6081). The full list of places that need to be updated is not clear but https://github.com/quantumlib/Cirq/pull/6112#issuecomment-1570881175 is a good starting point
there are two sides of qubit management. the first is online qubit management and by that I mean employing some allocation strategy on the fly to reduce the number of allocated qubits to maximize reuse. for now only have one that does that and that the GreedyQubitManager. If you have a better allocation strategy you create a new qubit manager that uses it and in your PR you should add the improved qubit counts
the second side of qubit management is post processing/transformation. regardless of the efficiency of strategy employed by the qubit manager it can't be optimal because it has to make conservative choices because it has only local information. post processing transforms on the other hand receive the entire circuit as input so they have to be at least as good. At the moment we have only one such transform map_clean_and_borrowable_qubits

the first task is easiest and doesn't require quantum knowledge. the second and the third also don't require quantum knowledge but they require knoweldge of graphs and algorithms.

shef4 commented 1 year ago

Thanks for explaining! I'll start working on the first task so I get a quick start.

the online qubit management sounds very cool though. From what I understood it reminds me of concurrent resource allocation but instead of threads & processing time, it's qubit memory & the duration of allocation.

Is dead/live lock something that needs to be considered?
Would this description of the problem as a graph be accurate?
- finding the "best" qubit while considering the "cost" for a specific qubit relative to others. best: the quickest to access cost: the time duration the qubit will be allocated for

if so I might look into trying topologically sort on the current free qubits beforehand so the manager does have to wait when it asks for a free qubit. The main issue I see is that it would need to be resorted when 1 or more qubits are deallocated. This might be O(N*log(M)). N - number of deallocated qubits to add to the sorted list M - length of the sorted list

I'll start researching other graph algorithms that might be useful/similar to the main problem.

NoureldinYosri commented 1 year ago

@shef4 no deadlocks are not applicable here since there is no interdependency between qubits and allocating qubits happens sequentially.

there are multiple valid mappings to graph problems. the problem can be states as interval coloring (the qubit are colors), clique cover (qubits are assigned per clique), and even as a scheduling problem. I don't really understand your mapping so I can't judge.

shef4 commented 1 year ago

Thanks for the clarification, I see what you saying and the problem statements are super helpful

ishmum123 commented 10 months ago

hey @shef4 the qubit manager has a couple of tasks remaining for it

most protocols don't support it. We added support for it in a couple of them (see. Add support for allocating qubits in decompose to cirq.unitary #6112 and Update cirq simulators to work for cases where an operation allocates new qubits as part of it's decomposition. #6081). The full list of places that need to be updated is not clear but Add support for allocating qubits in decompose to cirq.unitary #6112 (comment) is a good starting point

there are two sides of qubit management. the first is online qubit management and by that I mean employing some allocation strategy on the fly to reduce the number of allocated qubits to maximize reuse. for now only have one that does that and that the GreedyQubitManager. If you have a better allocation strategy you create a new qubit manager that uses it and in your PR you should add the improved qubit counts

the second side of qubit management is post processing/transformation. regardless of the efficiency of strategy employed by the qubit manager it can't be optimal because it has to make conservative choices because it has only local information. post processing transforms on the other hand receive the entire circuit as input so they have to be at least as good. At the moment we have only one such transform map_clean_and_borrowable_qubits

the first task is easiest and doesn't require quantum knowledge. the second and the third also don't require quantum knowledge but they require knoweldge of graphs and algorithms.

Can I take on the second task?

NoureldinYosri commented 10 months ago

@ishmum123 you are free to propose a new qubit manager and whether it gets merged depends on its performance and efficiency. items 2 and 3 are less of task and more of goals that we want to achieve.

tanujkhattar commented 4 months ago

We have added an interface for QubitManager to Cirq and updated protocols (like decompose protocol and simulators) to support ancilla allocation and deallocations hidden inside the decomposition of composite gates. We also have implementations for a GreedyQubitManager and SimpleQubitManager.

This issue is now complete and can be closed.

quantumlib / Cirq

Quantum Memory Management for Cirq #6040

RFC

Background

Key Observations

Qubit allocations can introduce new implicit dependencies between operations

Borrowing qubits cannot be supported during circuit construction

Cirq fails if a gate decomposes into an OP_TREE with operations on newly allocated ancilla qubits.

Design Details

Circuit construction

Circuit Simulations

Example code snippets

Conclusion

Update - 3rd April 2023

We need `qalloc`/`qfree` inside `_decompose_` method of a `Gate`

Way-1: Update the signature of `_decompose_` protocol to accept `qubit_manager` argument.

Way-2: Introduce a new `_decompose_with_qubit_manager_` protocol that accepts a `qubit_manager` argument.

We need a qubit allocation strategy at circuit level scope or above.

quantumlib / Cirq

Quantum Memory Management for Cirq #6040

RFC

Background

Key Observations

Qubit allocations can introduce new implicit dependencies between operations

Borrowing qubits cannot be supported during circuit construction

Cirq fails if a gate decomposes into an OP_TREE with operations on newly allocated ancilla qubits.

Design Details

Circuit construction

Circuit Simulations

Example code snippets

Conclusion

Update - 3rd April 2023

We need qalloc/qfree inside _decompose_ method of a Gate

Way-1: Update the signature of _decompose_ protocol to accept qubit_manager argument.

Way-2: Introduce a new _decompose_with_qubit_manager_ protocol that accepts a qubit_manager argument.

We need a qubit allocation strategy at circuit level scope or above.

We need `qalloc`/`qfree` inside `_decompose_` method of a `Gate`

Way-1: Update the signature of `_decompose_` protocol to accept `qubit_manager` argument.

Way-2: Introduce a new `_decompose_with_qubit_manager_` protocol that accepts a `qubit_manager` argument.