Allocated qubits not reused when optimizing for depth

sam-jaques commented 4 years ago

Describe the bug

The depth-width optimizer added in version 0.13.20102604 ensures that if several simultaneous operations allocate and release qubits, they do not share the qubits. However, sequential operations should be able to re-use a released qubit, but the width counts in the depthCounter of the QCTraceSimulator do not reflect this when OptimizeDepth is true.

To Reproduce

The code below allocates 10 qubits, then performs an operation on each qubit that requires allocating one extra qubit. The compiler has a choice to allocate 10 qubits at once and finish quickly, or allocate 1 qubit at a time and finish slowly. However, if this same operation is repeated 10 times, it should be able to reuse the allocated qubits either way.

    operation SimultaneousUse() : Unit {
        body (...){
            using (register = Qubit[10]){
                for (idx in 0..9){
                    SomeOp(register[idx]);
                }
                ClearRegister(register);
            }
        }
    }
     operation SequentialUse() : Unit {
        body (...){
            using (register = Qubit[10]){
                for (idy in 0..10){
                    for (idx in 0..9){
                        SomeOp(register[idx]);
                    }
                }
                ClearRegister(register);
            }
        }
    }

    operation SomeOp(test : Qubit) : Unit {
        body (...){
            using (spareQubit = Qubit()){
                T(test);
                CNOT(test, spareQubit);
            }
        }
    }

   operation ClearRegister(register:Qubit[]):Unit {
        for (idx in 0..Length(register)-1){
            AssertMeasurementProbability([PauliZ],[register[idx]],Zero,0.0,"n/a",0.5);
        }   
        ResetAll(register);
    }

Expected behavior

When the QCTraceSimulator is set for OptimizeDepth = true, then the width of SimultaneousUse should be 20, and depth (of all gates) should be 4. SequentialUse should have depth 24 and also have width 20.

Actual behavior

SequentialUse returns 120 for both WidthAverage and WidthMax. T

More specifically, if sim is the trace simulator, then sim.ToCSV()[MetricsCountersNames.depthCounter] returns 120 under the WidthAverage and WidthMax columns. sim.ToCSV()[MetricsCountersNames.widthCounter] returns 0 for InputWidthAverage and 11 for ExtraWidthAverage.

System information

.NET Core Version: 3.1.403
Q# Version 0.13.20102604

bamarsha commented 4 years ago

@DmitryVasilevsky, could you take a look at this behavior with sequential operations in the tracer?

DmitryVasilevsky commented 4 years ago

This is actually expected behavior when optimizing for depth. Reuse of qubits may create dependencies that increase circuit depth.

sam-jaques commented 4 years ago

Qubit reuse may cause dependencies, but in the example given it does not, since SomeOp must finish and release its qubit before it is called again (as I understand, qubit release is not regarded as a measurement and should have have depth 0). In the use where I noticed this, the depth-optimizing compilation increased qubit count from 207 to 10679 for the same operation.

In the documentation here: https://docs.microsoft.com/sl-si/quantum/user-guide/machines/resources-estimator it says, regarding width:

Please note, that reused qubits do not contribute to this number. I.e. if a few qubits have been released before operation A starts, and all qubit demanded by this operation A (and operations called from A) were satisfied by reusing previously release qubits, the Width of operation A is reported as 0.

and does not mention under the "OptimizeDepth=true:" paragraph that this property no longer holds.

My understanding of circuit optimization is that detecting and optimizing dependencies is generally a very hard problem, but this seems like an easier case: it would be ideal if the compiler could notice that if a qubit is released at depth X, then any operation starting at a depth Y > X can reuse that qubit. Should I re-label this as a feature request?

DmitryVasilevsky commented 4 years ago

@sam-jaques : Thank you for filing this issue and explaining your example. Yes, converting this issue to a feature request would be best. As you said, detecting and optimizing dependencies is generally a very hard problem. We've been considering various approaches to the depth-optimized case, and they are all either prohibitively expensive in terms of computation resources or do not provide optimal solution. So we decided to start with something simple and easy to understand although it's quite unexpected at first glance. The paragraph you quoted is technically still correct in OptimizeDepth=true case. It says if qubits are reused, they are not counted. As qubits are not reused, they are counted. Agreed, this case may warrant a special mention to avoid confusion.

sam-jaques commented 4 years ago

From my limited understanding of the compiler, every qubit i has a depth d_i, and when a gate of depth d_g is applied to qubits i1,...,in, it sets the depth of each one to be max(d_i1,...,d_in) + d_g. It seems like if releasing a qubit is treated as a gate, the compiler could track which qubits were released, and at what depths. Then when it wants to allocate a new qubit, it could wait until the first gate is applied to that new qubit, compute the minimum depth when this gate is applied, and then check all released qubits to see if there is one that was released before that depth (perhaps choosing whichever one was released most recently) and safely reallocate it; if no such qubit exists, then add a new one. This is a very ad-hoc solution, but I think it would fix the example I gave.

I can't find a way to change the label of this issue, unfortunately.

DmitryVasilevsky commented 4 years ago

Hi @sam-jaques!

I am working to resolve this issue and here’s a quick update. Thanks for the proposal, we are definitely considering this approach. It has nice properties such as being relatively straightforward and simple to implement. While it may behave well in your situation, unfortunately it isn’t handling some other situations well. For a simple example, consider a program where every qubit starts with a single gate on it, say H gate. In that case these gates will be all be scheduled to start at time 0 and qubits will not be reused at all. This example shows that this question is not so much about assigning and reusing qubits, but mostly about scheduling gates at appropriate times. Scheduling is typically a hard problem.

As execution of quantum program proceeds, more and more gates become known. The gates aren’t scheduled to be executed in this order. Algorithm that finds minimum depth schedules these gates to be executed at times appropriate to achieve minimum depth. Current greedy algorithm will result in a schedule that fails to achieve optimal width given this minimum depth. In general, we cannot hope to compute optimal width in this case, such computation is too costly, but we do want to achieve reasonable width. So, I’m considering several heuristics and algorithms to do this.

I’m considering greedy algorithm based on ideas similar to Kruskal's algorithm for MST, which would compute reasonable availability time for qubits. I’m also considering some heuristics, which unfortunately may have negative effect on depth. So, there’s another idea to compute minimum depth and corresponding width as it is now, but also add a pair with a depth close to minimum and compatible reasonable width. All these approaches require post-processing step where availability times for qubits are used to compute qubit reuse and hence the circuit width. This doesn’t work well with the requirement for this algorithm to be an “on-line” stream processing algorithm: as it processes incoming gates, tracer needs to report width and height at the exit of each scope. So many considerations should be taken into account to make the decision on potential solution.

I’ll update this thread when we come up with a reasonable solution. Comments are always welcome!

DmitryVasilevsky commented 3 years ago

We have settled on a potential solution to this problem. I'm currently cleaning up and testing the code brining it from prototype to production. Prototype is located here:

https://github.com/microsoft/qsharp-runtime/tree/dmitryv/QubitReuseWithMinimalDepth

Whitepaper describing chellenges and current approach is here:

https://github.com/microsoft/qsharp-runtime/blob/dmitryv/QubitReuseWithMinimalDepth/src/Simulation/Simulators/QCTraceSimulator/Docs/Width%20and%20Depth%20in%20the%20Tracer.docx

microsoft / qsharp-runtime