Closed tanujkhattar closed 4 months ago
I'm interested in helping out on this project if possible.
This project reminds me of #4100 and related PRs, which allow simulators to perform implicit qubit (de-)allocation when operating on definite qubits. It seems that the primary features being proposed here are:
QubitManager
s to encapsulate different strategies for allocation.If I understand correctly, none of this is meant to affect existing, definite-qubit circuits - only new circuits which make use of allocated qubits will be affected. Is this an accurate summary of the proposal?
(Apologies if this is just restating what you've already said - writing it out mostly helped me make sense of it myself.)
If I understand correctly, none of this is meant to affect existing, definite-qubit circuits - only new circuits which make use of allocated qubits will be affected. Is this an accurate summary of the proposal?
That is absolutely correct! The goal is to provide a framework for users to easily construct circuits composed of composite operations (aka gadgets) that can use additional ancillas "under the hood" (i.e. as part of their decompose protocol).
I'm not quite clear, do the allocations occur at runtime, or is there a transformer by which you take a circuit definition that has ancilla-using ops, and transforms that into a fully-defined circuit that you can run?
Knee-jerk reaction is that the latter would be preferable, since the former could lead to space errors halfway through simulation, whereas the latter you know exactly how many qubits you need before simulating. The former may be useful if we have nondeterministic circuits (in which case routing the ancillas in advance may be impossible), but we don't currently have that.
Aside, it may be useful to have a PC memory requirement calculator for simulation, since we should be able to determine the maximum entanglement for a circuit prior to running it.
One interesting use case would be the deferred measurements transformer. Would it be possible to rewrite this in terms of ancillas? Hmm, actually maybe not. The qubits added in that algorithm end up getting measured at the end, so aren't true ancillas, so never mind.
nondeterministic circuits
I'm an idiot. I spent six months working on adding nondeterministic circuits to cirq....
In particular, CircuitOp.repeat_until
is completely nondeterministic. It can't be unrolled, because it could go forever. It also can't be turned into a deferred measurement. So ostensibly a runtime allocator would be useful there, even though there's no way to guarantee it won't OOM.
Alternatively you could specify that a dynamic qubit allocation within a subcircuit has to be deallocated within that same subcircuit, avoiding the OOM potential. Then that would also then become deterministic (in terms of circuit layout), as each repetition could reuse the same ancilla, even if the number of repetitions is nondeterministic. So could be done at circuit building time rather than runtime.
So yeah, I think focusing on the use case where the tooling transforms a circuit with ancillas to a fully-defined circuit is the more applicable for Cirq. Runtime allocation is more useful if you really want to move toward Turing-completeness with the classical circuit model, like the Q# approach. But right now Cirq doesn't have enough classical logic primitives to make that worthwhile.
@daxfohl It's indeed the latter with the intention to implement different strategies that optimize for different properties (e.g. circuit width, depth ..)
As for nondeterministic circuits. if these circuits/subcircuits gaurentee that the qubits will be clean (returned to their initial state) at the end then they can be created using BorrowableQubits
. this way the number of qubits will be constant without imposing any restriction on the circuit (e.g. forcing them to manually deallocate ancillas)
Correction: the type that will always be gauranteed to be clean is CleanQubit
not BorrowableQubit
Potential use case: part of the reason we have CircuitOperation
instead of CircuitGate
is because at the time Cirq had no concept of ancilla / placeholder qubits. With this change, we could plausibly define a CircuitGate
and thus ensure that all Operation
s are GateOperation
s.
...then again, that would be a pretty big change for a relatively minor improvement. Certainly not a high-priority task to pursue, if indeed we decided to pursue it at all.
I am concerned that the QubitManager is a global state. In general, global state is generally bad practice, as it is tricky to make statements or guarantees about such objects. Other code unrelated to you can change it.
Using a context manager is much better, but then this has its own gotchas if two separate pieces of code
I also think that kwargs should be passed to the QubitManager so that you can potentially adjust behavior based on custom parameters. (Most obviously, you may want to label the ancilla qubits in some way). While resource estimation might not care about this 'temporary' state, I would be hesitant to say that no users care about how this works. If nothing else, it would be good for debugging if you get some weird ancillae that you were not expecting and want to know where they came from.
Other than these two concerns, I like it. It's exciting to see this!
+1 to the above. I think a good number (all?) of cirq protocols could allow kwargs for easier customization. If decompose
accepted kwargs, it would be possible to do a cirq.decompose(op, qubit_manager=my_qubit_manager)
directly here without making any underlying code changes at all.
@daxfohl I updated the original text with some more context around how simulators would be affected by this change. Can you please take a look and share your thoughts?
So you mean doing the allocations at runtime rather than compile time?
Should be doable. First have to undo #5748, and re-add the SimulationState.with_qubits
method (have to figure out how to make mypy happy), which will give you the ability to add ancillas at runtime. Then you have to add a similar SimulationState.without_qubits
that uses self.factor
in the same way, to remove ancillas at runtime. (Note you probably want to make those mutating since this is going to be directly updating the simulator state; the with_qubits
implementation was pure IIRC.)
Then in SimulationState.strat_act_on_from_apply_decompose
, you'll need to add some way of identifying all the ancillas created in the _try_decompose_into_operations_and_qubits
step, which might be challenging because that function is weird (it always returns LineQubits and you have to map them back into the actual qubits the operation uses). I'd think you could assume in line 99 that all "extras" in qubits1
are the ancillas, and so then you could create new "UuidQubit" (or something, to avoid collisions with existing qubits) for each on the spot. Then, you use the above functions you created to add the ancillas to the state, iterate through the decomposed ops, then remove the ancillas before returning (optionally calling reset on each of them first to ensure they factor cleanly).
This would generically give you the ability to do the allocations at runtime in all simulators that support kron and factor. (Density Matrix and State Vector do; I couldn't figure out a factor
algo for Clifford, and would bet that if one exists it is exponential in the general case (though it might be possible to have measure/reset force it to an easily factorizable representation); factor
for MPS should be trivial but I never got around to it).
The hard part may be getting the QubitManager
in there somehow. First draft I guess you can use a global config. But I'll go back to my above comment and say all the protocols and handlers should be required to support and pass through **kwargs
. Then you could just pass it through act_on
for free.
I figured I'd try it out. https://github.com/daxfohl/Cirq/commit/548e61f363deb97668076c9b696b48aa19577c28
Here we make an ancilla'd X gate by X'ing a new ancilla, CX'ing from the ancilla to the target, then un-X'ing the ancilla to get it back to zero. Seems to work equivalently to X gate for state vector and density matrix simulators.
One interesting use case would be the deferred measurements transformer. Would it be possible to rewrite this in terms of ancillas? Hmm, actually maybe not. The qubits added in that algorithm end up getting measured at the end, so aren't true ancillas, so never mind
@daxfohl I think it should be possible to use the new proposed qalloc
to replace the manual allocation of ancillas in deferred measurement transformer here:
https://github.com/quantumlib/Cirq/blob/f1149962b33d15d8bdbb3d00a6a8e453924d79da/cirq-core/cirq/transformers/measurement_transformers.py#L107
But I'm not sure if this is what you had in mind when you say "Would it be possible to rewrite this in terms of ancillas? ". In general, qalloc
/ qfree
are convenience methods for users to not have to manually keep track of newly allocated qubits when they don't need to (which is kind of what's happening already in deferred measurement transformer).
Potential use case: part of the reason we have CircuitOperation instead of CircuitGate is because at the time Cirq had no concept of ancilla / placeholder qubits.
@95-martin-orion Could you elaborate more? The newly added ancilla / placeholder qubit types are still valid cirq qubits (they derive from cirq.Qid
). It's not clear to me how having these new types enables us to have a CircuitGate
?
I'm not sure where I was going. Both things kind of create qubits out of thin air, so it seemed maybe there was some overlap that could be wheedled out. But I don't think so. Ancillas have to be freed at the end of the operation. "MeasurementQids" from deferred measurements have to stay around until the end of the circuit. So it's not quite the same thing.
Potential use case: part of the reason we have CircuitOperation instead of CircuitGate is because at the time Cirq had no concept of ancilla / placeholder qubits.
@95-martin-orion Could you elaborate more? The newly added ancilla / placeholder qubit types are still valid cirq qubits (they derive from
cirq.Qid
). It's not clear to me how having these new types enables us to have aCircuitGate
?
[Disclaimer: the only benefit of this is to (potentially) make the code cleaner. Net value added is likely small.]
Some of the details are lost to history, but I have a couple of docs on the original CircuitGate
plan that I've shared with you out-of-band. The surface-level issue I'm remembering is that CircuitGate
needed to provide qubits to its subcircuit, but Gate
has no qubits and gluing "real" qubits to a Gate
was essentially an Operation
, even if they were intended as placeholder qubits.
To take this one step further: we could have all Gate
s have ancilla qubits to represent the qubits they act on. This would make them more like functions - Gate.on
would directly map real qubits to placeholders, instead of the nebulous behavior we have today (e.g. CZ and the extended "control/target" discussion)
Oh, I know what I was thinking. Related to the add_qubits
in the POC I linked above. If we have that way to dynamically add qubits in-situ during a simulation, it'd be possible to do the deferred measurements at runtime rather than ahead of time. Previously we had a perform_measurements
option for density matrix simulator that would replace measurement gates with phase damp gates. But with the addition of CCOs that became insufficient and had to be removed in favor of the deferred measurement transformer.
But if we had the ability to add qubits during the simulation, SimulationState
could have a _deferred_measurement
mode, which, in SimulationState.measure
would dynamically add a MeasurementQid
instead of actually measuring the qubit. The only problem then is that the logic for checking the classical state for gating classically controlled ops is in the CCO itself, so it can't be overridden in the SimulationState
class. However it should be straightforward to move that code to SimulationState
. Then once that's done you get a runtime deferred measurement transform.
Update: POC of basic functionality https://github.com/quantumlib/Cirq/commit/f7de78d3b851e5aabe767458dc256248d0a9f7e6.
LoggingSimulationState
that simply logs every operation it has seen, which the AOT transformer could use to generate the transformed circuit. But I digress....I am concerned that the QubitManager is a global state. In general, global state is generally bad practice, as it is tricky to make statements or guarantees about such objects. Other code unrelated to you can change it.
@dstrain115 I agree that it would have been nicer if we didn't need to maintain a global state and could pass around qubit manager everywhere. I thought hard about potential ways of passing around qubit manager and I concluded that it's best to go with a global state given the tradeoffs. Let me provide some more context as follows.
qalloc
/qfree
inside _decompose_
method of a Gate
We need to sneak in our qalloc
/ qfree
methods inside the _decompose_
method of a Gate
/ Operation
. There are a few ways to make this happen without having global methods
_decompose_
protocol to accept qubit_manager
argument.def _decompose_(self, qubits)
? cirq.decompose_once()
protocol would need to be updated to call gate._decompose_(qubits, qubit_manager)
; but not all gates would support a qubit_manager as mentioned in the above point.intercepting_decomposer
and fallback_decomposer
since the OpDecomposer definition would now also need to include a parameter for the qubit manager. _decompose_with_qubit_manager_
protocol that accepts a qubit_manager
argument.cirq.decompose_wiith_qubit_manager
instead of cirq.decompose
as the default decomposition protocol, which would be very cumbersome to update.cirq.decompose
would not automatically be compatible with newly written operations that use newly allocated ancillas in their decompositionIdeally, the qubit management should happen at the scope of a circuit. A "global" scope is probably too large but an operation level scope is too restrictive because in Cirq, gates and operations can exist without a circuit.
op = gate.on(qubit_manager)
is not ideal because operations can exist without caring about a qubit allocation strategy for a specific circuit. The way this is currently handled in the current design is that the context manager is used to define a local scope (which is broader than circuit scope but should ideally be the same) and a default SimpleQubitManager + post-process transformer is the recommended strategy where SimpleQubitManager ensures that the compute graph is captured without adding additional dependencies and a post-process transformer ensures that the actual greedy allocation strategy is followed at the scope of the circuit.
It's possible to use inspect.signature
and only pass in qubit_manager
when the target method accepts it. Here is an example of where that was done for backwards compatibility. https://github.com/quantumlib/Cirq/pull/4789/files#diff-4e026f4892ce7bbefa6a342ec72c69a62dde54695e5dcbbe5171a2aeb5c73e5f
Okay, I have a working prototype that does not require a global state of qalloc
/ qfree
and instead uses a new _decompose_with_qubit_manager_
protocol. I updated cirq.decompose
and it's variants accordingly to first try decomposition with a qubit manager and fallback on regular decompose .
FYI @daxfohl @dstrain115, please see https://github.com/tanujkhattar/Cirq/commit/f05f7d7d204705f35588e2074a9b48772df5ff48 and let me know if you have any initial feedback. In terms of next steps, I'll mark the issue as accepted and create subtasks assuming we add a new decompose protocol and don't maintain a global state. If we discover any further issues with using a new decompose protocol, we can always fallback the original global state approach.
The problem recognized above is cirq.decompose
protocol for both its importance and the way it works.
In addition to supporting a qubit manager there are other related considerations.
g._decompose_(goal = T_COUNT)
or g._decompose_(goal = Toffoli_COUNT)
g._decompose_(model = 'arxiv:1234')
I think instead instead of _decompose_with_qubit_manager_
we need something like _decompose_with_options_(self, qubits, qubit_manager: cirq.QubitManager, cost_model: cirq.SupportedCostModels = cirq.SupportedCostModels.T_COUNT, from_model: bool = True)
(function name is just a suggestion) that uses a qubit_manager
for ancilla allocation, and returns the decomposition from model
(e.g. paper) if a model is supplied or a decomposition that minimizes the cost of the given target cost_model
.
This would work the same as in https://github.com/quantumlib/Cirq/issues/6040#issuecomment-1493658988 but will allow us to support different cost models or experimenting with novel decompositions.
The model should be supplied in the constructor and can be either an OP_TREE
generator that accepts a qubit_manager
or an enum value (e.g. a paper number)
SomeGate() # default None
SomeGate(model=SomeGate.Models.ARXIV_1234)
SomeGate(model=some_OP_TREE)
Largely agree with @NoureldinYosri above. "Why we decompose", is something I still have a hard time with after two years on the project, and I know end users ask about this too. I think Cirq team needs to take the opportunity to step back and come up with a holistic answer there.
One extra note, should GateSets perhaps be involved? I'd think you want to decompose to a gate set somehow. Or maybe decomposition should be external to the gate itself, since there are always multiple ways to decompose a gate. I don't have an answer, but I think there needs to be something more unified rather than yet another monkey patch regarding decomposition.
@daxfohl
One extra note, should GateSets perhaps be involved? I'd think you want to decompose to a gate set somehow. Or maybe decomposition should be external to the gate itself, since there are always multiple ways to decompose a gate. I don't have an answer, but I think there needs to be something more unified rather than yet another monkey patch regarding decomposition.
This is why I'm introducing these ways of instantiating a gate
SomeGate(model=SomeGate.Models.ARXIV_1234)
SomeGate(model=some_OP_TREE_generator)
so for example if a decomposition targeting a specific gateset was introduced in an a paper say ARXIV_XXX. Then either we have it and the user can simply create an instance as SomeGate(model=SomeGate.Models.ARXIV_XXX)
. or if we don't have it then they can supply it to constructor and expect that everytime that gate is decomposed their provided decomposition will be used.
def custom_decomposer(qubits, qubit_manager) -> cirq.OP_TREE:
"""yields and custom decomposition"""
SomeGate(model=custom_decomposer)
Or maybe decomposition should be external to the gate itself
I have explored this in the past and I think this would be a huge breaking change for Cirq IMO. Qiskit supports doing decompositions external to the gate itself by maintaining an EquivalenceLibrary, but the reason it works is that they have a very restricted set of types that can be "parameters" to a gate/operation. And for each (gate_name, num_qubits)
; you add an equivalence from (gate_parameters -> decomposed_circuit)
.
For us, a gate can be any class that can accept any arbitrary parameters (including other parameterized gates; eg: for ControlledGate). And, decomposition of gate can depend upon arbitrary conditions on these parameters. Therefore, capturing this complexity externally for all gates and for all ranges of parameters for those gates would mean that we expose an API like the intercepting_decomposer
/ fallback_decomposer
that take a gate
/operation
instance and maybe a bunch of options (like qubit manager) and return an appropriate decomposition. However, this has the risk that everytime we update the parameters of a gate; the such op decomposer implementations external to the gate class would also need to be updated.
Note that even if we make such an OpDecomposer
library a more formal concept, it can be passed directly to the existing cire.decompose
protocol as an intercepting_decomposer
and therefore I'm not so sure if we really gain much.
The idea of the decompose protocol is to provide a way to define the action of a composite gate in terms of other "simpler" gates, by potentially allocating new ancilla qubits as part of defining this action. For the purpose of this issue, I don't think we need to generalize this even more to the point where we can pick an "optimal" decomposition depending on the use case -- this is something that can be achieved by passing an intercepting_decomposer
instance as well. Although I agree that having a more structured way to achieve this would be useful.
I propose that we open a separate issue to brainstorm the generalization of decompose protocol and discuss "How to pick an optimal decomposition for a gate". For the sake of this issue, I'd like to keep the discussion focussed on "How can I define the action of a composite gate in terms of other simpler / known gates; by potentially using additional ancilla qubits?".
What do you guys think? @NoureldinYosri @daxfohl
@tanujkhattar on the one hand I don't want to widen the scope of this issue. however I do believe that the discussion around the decompose protocol is important here due to the following reasons
decompose
protocol is central to this issue and is the main reason we have to use a context manager so that on the one hand we don't break the protocol and on the other have a way of injecting the qubit manager into the logic of _decompose_
method and the decompose protocol. In other words we are dancing around two problems: trying to avoid globals and trying not break the decompose protocol_decompose_with_qubit_manager_
and more importantly an introspection method (inspect.signature
) and personally I'm not a fan of introspection methods since they always point to a design issue.cirq.decompose
or patching the one I created.g1, g2 = Gate(model='arxiv:123'), Gate(model='arxiv:456')
where g1, g2
are instances of the same gate but with different decompositions)Thus as @daxfohl mentioned we probably should use this as an opportunity to rethink the decompose protocol to make our lives easier for the goals dependent on it:
I don't have any additional thoughts on decompose, I think you two have covered it well.
@tanujkhattar I just looked at your proposed solution and it does bring up an interesting question. For the unitary protocol, there's really no way to identify which qubits are ancillas in order to remove them afterward. But they probably should be extracted. It's straightforward to do the linalg if you know which axes to extract. But identifying those axes, no idea.
I also don't know the theory, whether gates with ancillas are guaranteed to be unitary in the non-ancilla qubits. Seems like they should be, but if that's the case then why are they necessary?
@daxfohl
For the unitary protocol, there's really no way to identify which qubits are ancillas in order to remove them afterward. But they probably should be extracted. It's straightforward to do the linalg if you know which axes to extract. But identifying those axes, no idea.
See my sample implementation in https://github.com/quantumlib/Cirq/issues/6101. In general, for operations we know which qubits they act on before calling cirq.decompose(op)
and for gates, we explicitly choose a set of qubits and pass them to the decompose protocol. Any additional qubits in the decomposed OP_TREE are ancillas and we should assume that they are cleaned up within the decompose protocol. We only support allocating ancillas which should be cleaned up within the decompose protocol -- this is slightly restrictive but makes everything much easier.
It works, but I can't help but feel this is something that should be done at a lower level. Otherwise there's going to be logic removing ancillas scattered around lots of different places in the codebase.
It may be hard to change the low level code in a backwards compatible way though. Perhaps defer to 2.0 and introduce some breaking changes that allow doing things more cleanly. I think there's a reasonable argument to be made that strict adherence to backwards compatibility has slowed down dev velocity and made the user experience suboptimal. Planning some breaking changes for a 2.0 cleanup release may be a good kickstart to the product.
It works, but I can't help but feel this is something that should be done at a lower level. Otherwise there's going to be logic removing ancillas scattered around lots of different places in the codebase.
Can you give an example of what do you mean by "lower level" ?
Borrowing qubits is entirely fine in the ideal world of unitary operations. On NISQ devices all operations are actually noisy quantum channels. For example, CNOT12, Z2, CNOT12, Z2 implements the Z1 gate and restores the second qubit to whatever state it was before it was borrowed for the operation. However, when we replace the four gates with their noisy equivalents this is no longer true.
That said, this feature can be useful to make it easier and more automated to make an effective use of a quantum processor. I'd just keep in mind that this is an idealization. Therefore, the user should be made aware of the decisions made by memory management and should be able to affect them and override them. IOW, memory management for NISQ devices should be very much unlike the memory management on a classical computer which hides details from the programmer.
For example, there is a trade-off is between borrowing otherwise inactive qubits and applying dynamical decoupling to them. This should be something the programmer gets to decide.
We only support allocating ancillas which should be cleaned up within the decompose protocol -- this is slightly restrictive but makes everything much easier.
See this comment for some ideas on how to test this. There I assumed we may want to keep decompositions that don't return auxiliary qubits to the initial state, but I agree with you that restricting to just those that do (and hence applying the tests proposed in the linked comment to all _decompose_
functions) is simpler and makes a lot of sense.
@tanujkhattar I don't have an example of "lower level", just thinking that the ancilla handling needs to be somewhere such that you don't have to explicitly deal with it in all the various protocols that work on operations and gates. But it shouldn't be hidden away completely; there are situations where you might need to explicitly handle the ancillas. Judging by @viathor's comment, those situations may be more common than not in actual quantum research.
Anyway that's all I was trying to convey.
I'm really interested in helping with the QubitManager interface if it still needs someone to work on it.
Questions:
hey @shef4 the qubit manager has a couple of tasks remaining for it
the first task is easiest and doesn't require quantum knowledge. the second and the third also don't require quantum knowledge but they require knoweldge of graphs and algorithms.
Thanks for explaining! I'll start working on the first task so I get a quick start.
the online qubit management sounds very cool though. From what I understood it reminds me of concurrent resource allocation but instead of threads & processing time, it's qubit memory & the duration of allocation.
Is dead/live lock something that needs to be considered?
Would this description of the problem as a graph be accurate?
if so I might look into trying topologically sort on the current free qubits beforehand so the manager does have to wait when it asks for a free qubit. The main issue I see is that it would need to be resorted when 1 or more qubits are deallocated. This might be O(N*log(M)). N - number of deallocated qubits to add to the sorted list M - length of the sorted list
I'll start researching other graph algorithms that might be useful/similar to the main problem.
@shef4 no deadlocks are not applicable here since there is no interdependency between qubits and allocating qubits happens sequentially.
there are multiple valid mappings to graph problems. the problem can be states as interval coloring (the qubit are colors), clique cover (qubits are assigned per clique), and even as a scheduling problem. I don't really understand your mapping so I can't judge.
Thanks for the clarification, I see what you saying and the problem statements are super helpful
hey @shef4 the qubit manager has a couple of tasks remaining for it
- most protocols don't support it. We added support for it in a couple of them (see. Add support for allocating qubits in decompose to cirq.unitary #6112 and Update cirq simulators to work for cases where an operation allocates new qubits as part of it's decomposition. #6081). The full list of places that need to be updated is not clear but Add support for allocating qubits in decompose to cirq.unitary #6112 (comment) is a good starting point
- there are two sides of qubit management. the first is online qubit management and by that I mean employing some allocation strategy on the fly to reduce the number of allocated qubits to maximize reuse. for now only have one that does that and that the GreedyQubitManager. If you have a better allocation strategy you create a new qubit manager that uses it and in your PR you should add the improved qubit counts
- the second side of qubit management is post processing/transformation. regardless of the efficiency of strategy employed by the qubit manager it can't be optimal because it has to make conservative choices because it has only local information. post processing transforms on the other hand receive the entire circuit as input so they have to be at least as good. At the moment we have only one such transform map_clean_and_borrowable_qubits
the first task is easiest and doesn't require quantum knowledge. the second and the third also don't require quantum knowledge but they require knoweldge of graphs and algorithms.
Can I take on the second task?
@ishmum123 you are free to propose a new qubit manager and whether it gets merged depends on its performance and efficiency. items 2 and 3 are less of task and more of goals that we want to achieve.
We have added an interface for QubitManager
to Cirq and updated protocols (like decompose protocol and simulators) to support ancilla allocation and deallocations hidden inside the decomposition of composite gates. We also have implementations for a GreedyQubitManager
and SimpleQubitManager
.
This issue is now complete and can be closed.
RFC
Background
In Cirq, we apply gates on qubits to get operations, which can then be inserted into a container (Circuit) to construct a (implicit) compute graph. This pattern assumes that it’s straightforward for the users to identify the qubits on which a gate should act to yield operations.
However, as we gradually move beyond the NISQ era, this assumption faces a number of challenges listed below:
AND
gate as in https://arxiv.org/abs/1805.03662), it often becomes very tedious for users to correctly keep track of all the ancillas needed in the system.The goal of this design discussion is to propose a mechanism to enable cirq users to overcome the above mentioned challenges and more easily construct large circuits.
Key Observations
Qubit allocations can introduce new implicit dependencies between operations
Borrowing qubits cannot be supported during circuit construction
Cirq fails if a gate decomposes into an OP_TREE with operations on newly allocated ancilla qubits.
_num_qubits_
._decompose_
protocol to achieve the desired effect without exposing all the gory details.Design Details
Using the key observations above, the proposed high level design is as follows:
Circuit construction
qalloc
,qborrow
andqfree
methods which delegate calls to aQubitManager
. The choice of which qubit manager to delegate to can be controlled by a context manager.SimpleQubitManager
+ post-processing transformers but the framework is flexible enough to directly support other heuristic qubit allocation strategies at the time of circuit construction itself.SimpleQubitManager
that always allocates a new qubit for eachqalloc
/qborrow
call. This ensures that no implicit data dependencies are introduced at the time of circuit construction due to qubit allocation. In other words, the simple qubit manager uses an allocation strategy that maximises qubit width and minimized circuit depth.SimpleQubitManager
, we introduce two new internal qubit types -CleanQubit
andBorrowableQubit
.CleanQubit
/BorrowableQubit
s, and perform a smarter qubit allocation. At this step, different post processing transformers can use different qubit width / circuit depth tradeoff strategies and also inspect the compute graph to automatically support borrowing qubits.Circuit Simulations
Suppose we construct a circuit that contains an operation defined using it's
_decompose_
protocol and that uses additional ancillas invoked usingqalloc
/qfree
. The simulators would raise an error right now if we try to simulate such an operation because memory for a simulation is allocated in advance by inspectingcirq.num_qubits(op)
, and as a consequence the simulators expect an operation to decompose into an OP_TREE acting only a subset ofop.qubits
.To mitigate this issue, one easy option is to recursively decompose the input circuit till we get to a point where the resulting circuit is a "trivial" circuit, i.e. has no more implicit allocations / deallocations inside the decompose protocol of operations. We can introduce additional protocols to keep track of which operation is "trivial" or not so we only decompose the operations that have a hidden allocation.
Another option that comes to mind, but I'm not fully sure if we can implement efficiently in the existing cirq simulator infrastructure, is to come up with a way of computing the effect of a "non-trivial" operation (one that has hidden allocations) only on the qubits that it expects as input (for example, by computing a reduced density matrix). This should be possible because there's an implicit promise that the allocated ancillas should be cleaned up when the user calls
qfree(q)
and therefore an outer simulator should not have to care about them. cc @daxfohl I'm curious to hear your thoughts on this.Example code snippets
Simple Qubit Manager with post-process transformer
```python class GateAllocAndBorrowInDecompose(cirq.Gate): def __init__(self, num_alloc: int = 1): self.num_alloc = num_alloc def _num_qubits_(self) -> int: return 1 def __str__(self) -> str: return 'TestGate' def _decompose_(self, qubits): qa, qb = cqa.qalloc(self.num_alloc), cqa.qborrow(self.num_alloc) for q, b in zip(qa, qb): yield cirq.CSWAP(qubits[0], q, b) yield cirq.qft(*qb).controlled_by(qubits[0]) for q, b in zip(qa, qb): yield cirq.CSWAP(qubits[0], q, b) cqa.qfree(qa + qb) def test_decompose_and_allocate_qubits(): with cqa.memory_management_context(): op = GateAllocAndBorrowInDecompose(3).on(cirq.NamedQubit("original")) extra = cirq.LineQubit.range(3) circuit = cirq.Circuit(cirq.H.on_each(*extra), cirq.Moment(op), cirq.decompose_once(op)) cirq.testing.assert_has_diagram( circuit, """ _b0: ───────────────────────×───────────qft───×─────────── │ │ │ _b1: ───────────────────────┼───×───────#2────┼───×─────── │ │ │ │ │ _b2: ───────────────────────┼───┼───×───#3────┼───┼───×─── │ │ │ │ │ │ │ _c0: ───────────────────────×───┼───┼───┼─────×───┼───┼─── │ │ │ │ │ │ │ _c1: ───────────────────────┼───×───┼───┼─────┼───×───┼─── │ │ │ │ │ │ │ _c2: ───────────────────────┼───┼───×───┼─────┼───┼───×─── │ │ │ │ │ │ │ 0: ──────────H──────────────┼───┼───┼───┼─────┼───┼───┼─── │ │ │ │ │ │ │ 1: ──────────H──────────────┼───┼───┼───┼─────┼───┼───┼─── │ │ │ │ │ │ │ 2: ──────────H──────────────┼───┼───┼───┼─────┼───┼───┼─── │ │ │ │ │ │ │ original: ───────TestGate───@───@───@───@─────@───@───@─── """, ) allocated_circuit = cqa.decompose_and_allocate_qubits(circuit, decompose=lambda op: op) cirq.testing.assert_has_diagram( allocated_circuit, """ 0: ───────────H──────────────×───────────qft───×─────────── │ │ │ 1: ───────────H──────────────┼───×───────#2────┼───×─────── │ │ │ │ │ 2: ───────────H──────────────┼───┼───×───#3────┼───┼───×─── │ │ │ │ │ │ │ ancilla_0: ──────────────────×───┼───┼───┼─────×───┼───┼─── │ │ │ │ │ │ │ ancilla_1: ──────────────────┼───×───┼───┼─────┼───×───┼─── │ │ │ │ │ │ │ ancilla_2: ──────────────────┼───┼───×───┼─────┼───┼───×─── │ │ │ │ │ │ │ original: ────────TestGate───@───@───@───@─────@───@───@─── """, ) def decompose_func(op): return ( cirq.decompose_once(op) if isinstance(op.gate, GateAllocAndBorrowInDecompose) else op ) allocated_and_decomposed_circuit = cqa.decompose_and_allocate_qubits( circuit, decompose=decompose_func ) cirq.testing.assert_has_diagram( allocated_and_decomposed_circuit, """ 0: ───────────H───×───────────qft───×───────────×───────────qft───×─────────── │ │ │ │ │ │ 1: ───────────H───┼───×───────#2────┼───×───────┼───×───────#2────┼───×─────── │ │ │ │ │ │ │ │ │ │ 2: ───────────H───┼───┼───×───#3────┼───┼───×───┼───┼───×───#3────┼───┼───×─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ ancilla_0: ───────×───┼───┼───┼─────×───┼───┼───×───┼───┼───┼─────×───┼───┼─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ ancilla_1: ───────┼───×───┼───┼─────┼───×───┼───┼───×───┼───┼─────┼───×───┼─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ ancilla_2: ───────┼───┼───×───┼─────┼───┼───×───┼───┼───×───┼─────┼───┼───×─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ original: ────────@───@───@───@─────@───@───@───@───@───@───@─────@───@───@─── """, ) # Since we preserve moment structure, if TestGate is in the first moment then # we end up allocating 6 ancilla to not introduce any additional data dependency # due to our allocation strategy and thus not impact the circuit depth. allocated_and_decomposed_circuit = cqa.decompose_and_allocate_qubits( cirq.align_left(circuit), decompose=decompose_func ) cirq.testing.assert_has_diagram( allocated_and_decomposed_circuit, """ 0: ───────────H─────────────────────────────×───────────qft───×─────────── │ │ │ 1: ───────────H─────────────────────────────┼───×───────#2────┼───×─────── │ │ │ │ │ 2: ───────────H─────────────────────────────┼───┼───×───#3────┼───┼───×─── │ │ │ │ │ │ │ ancilla_0: ───×───────────qft───×───────────×───┼───┼───┼─────×───┼───┼─── │ │ │ │ │ │ │ │ │ │ ancilla_1: ───┼───×───────#2────┼───×───────┼───×───┼───┼─────┼───×───┼─── │ │ │ │ │ │ │ │ │ │ │ │ ancilla_2: ───┼───┼───×───#3────┼───┼───×───┼───┼───×───┼─────┼───┼───×─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ ancilla_3: ───×───┼───┼───┼─────×───┼───┼───┼───┼───┼───┼─────┼───┼───┼─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ ancilla_4: ───┼───×───┼───┼─────┼───×───┼───┼───┼───┼───┼─────┼───┼───┼─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ ancilla_5: ───┼───┼───×───┼─────┼───┼───×───┼───┼───┼───┼─────┼───┼───┼─── │ │ │ │ │ │ │ │ │ │ │ │ │ │ original: ────@───@───@───@─────@───@───@───@───@───@───@─────@───@───@─── """, ) ```
Greedy Qubit Manager
```python class GateAllocInDecompose(cirq.Gate): def __init__(self, num_alloc: int = 1): self.num_alloc = num_alloc def _num_qubits_(self) -> int: return 1 def _decompose_(self, qubits): for q in cqa.qalloc(self.num_alloc): yield cirq.CNOT(qubits[0], q) cqa.qfree([q]) def test_greedy_qubit_manager(): def make_circuit(): q = cirq.LineQubit.range(2) g = GateAllocInDecompose(1) circuit = cirq.Circuit(cirq.decompose_once(g.on(q[0])), cirq.decompose_once(g.on(q[1]))) return circuit with cqa.memory_management_context(cqa.GreedyQubitManager(prefix="ancilla", size=1)): # Qubit manager with only 1 managed qubit. Will always repeat the same qubit. circuit = make_circuit() cirq.testing.assert_has_diagram( circuit, """ 0: ───────────@─────── │ 1: ───────────┼───@─── │ │ ancilla_0: ───X───X─── """, ) with cqa.memory_management_context(cqa.GreedyQubitManager(prefix="ancilla", size=2)): # Qubit manager with 2 managed qubits and parallelize=True, tries to minimize adding additional # data dependencies by minimizing reuse. circuit = make_circuit() cirq.testing.assert_has_diagram( circuit, """ ┌──┐ 0: ────────────@───── │ 1: ────────────┼@──── ││ ancilla_0: ────X┼──── │ ancilla_1: ─────X──── └──┘ """, ) with cqa.memory_management_context( cqa.GreedyQubitManager(prefix="ancilla", size=2, parallelize=False) ): # Qubit manager with 2 managed qubits and parallelize=False, tries to minimize reuse by potentially # adding new data dependencies. circuit = make_circuit() cirq.testing.assert_has_diagram( circuit, """ 0: ───────────@─────── │ 1: ───────────┼───@─── │ │ ancilla_1: ───X───X─── """, ) ```
Conclusion
I'd love to hear thoughts of other maintainers and contributors, specifically @daxfohl @dstrain115 @dabacon @maffoo. Once there is consensus, I'll create smaller subtasks which can be taken up by 20%ers and other community contributors.
@NoureldinYosri and @mpharrigan helped with this exploration and have given a round of feedback offline.
Update - 3rd April 2023
See the comments for more discussions on maintaining a global state for
qalloc
/qborrow
vs passing around thequbit_manager
as part of the decompose protocol tocirq.Gate
andcirq.Operation
. Right now, we have a working prototype without maintaining a global state. Here, we list down all the subtasks for implementing this roadmap item, many of which can be worked upon in parallel.QubitManager
interface and aSimpleQubitManager
implementation, which always allocates newCleanQubits
/BorrowableQubits
for eachqalloc
/qborrow
request._decompose_with_qubit_manager_
protocol for bothcirq.Gate
andcirq.Operation
.cirq.decompose
and it's variants to first try_decompose_with_qubit_manager_
and fallback on_decompose
.GreedyQubitManager
that maximizes/minimizes qubit reuse based on a configurable parameter.CleanQubit
/BorrowQubit
in a circuit to system qubits using a qubit manager.