A bug about the width metric when setting OptimizeDepth = true

hzy-cas commented 3 years ago

Describe the bug

Pullzed by the width metric for some operations when setting OptimizeDepth = true .

To Reproduce

operation MTest1() : Unit
{
    body (...) {
    use a = Qubit[8] {
        CNOT(a[0], a[1]);
        CNOT(a[0], a[2]);
        CNOT(a[0], a[3]);
        CNOT(a[0], a[4]);
        CNOT(a[0], a[5]);
        CNOT(a[0], a[6]);
        CNOT(a[0], a[7]);
        }
    }
}

operation MTest2() : Unit { body (...) { use (a,b) = (Qubit[7],Qubit()) { CNOT(a[0], a[1]); CNOT(a[0], a[2]); CNOT(a[0], a[3]); CNOT(a[0], a[4]); CNOT(a[0], b); CNOT(a[0], a[5]); CNOT(a[0], a[6]); } } }

Expected behavior

It is obivous that for MTest1, all 8 qubits are used, and all these CNOT gates cannot be executed in parallel. Hence the width should be 8. For MTest2, the result should be the same.

Actual behavior

For MTest1

The resource estimator outputs:

Metric Sum Max CNOT 7 7 QubitClifford 0 0 R 0 0 Measure 0 0 T 0 0 Depth 0 0 Width 2 2 QubitCount 8 8 BorrowedWidth 0 0

For MTest2 The resource estimator outputs:

Metric Sum Max CNOT 7 7 QubitClifford 0 0 R 0 0 Measure 0 0 T 0 0 Depth 0 0 Width 3 3 QubitCount 8 8 BorrowedWidth 0 0

System information

OS: [e.g. Windows, macOS, Redhat, Ubuntu] Windos
.NET Core Version: [e.g. the output of dotnet --version] 3.1.408
IQ# Version [e.g. the output of dotnet iqsharp --version] iqsharp: 0.15.2101125897 Jupyter Core: 1.5.0.0 .NET Runtime: .NETCoreApp,Version=v3.1
Python Version (if applicable) [e.g. the output of python --version]

Additional context

At first, I think maybe the width metric is the maxium width of a layer of the circuit. This can explain the result of MTest1. However, for MTest2, each CNOT gates should be executed sequentially, hence the width of each layer should also be 2, but the actual result is 3.
Therefore, I think this may be a bug about the width counter, or I misunderstand the meaning of width metric and the qubits resuing strategy when setting OptimizeDepth = true.

DmitryVasilevsky commented 3 years ago

Thank you @hzy-cas for reporting this issue! The problem isn’t as straightforward as it seems.

Why is the width of the MTest1 circuit gets computed as 2? The simple answer is that the depth-width algorithm thinks that the circuit is equivalent to

            use a0 = Qubit() {
                use a1 = Qubit() { CNOT(a0, a1); }
                use a2 = Qubit() { CNOT(a0, a2); }
                use a3 = Qubit() { CNOT(a0, a3); }
                use a4 = Qubit() { CNOT(a0, a4); }
                use a5 = Qubit() { CNOT(a0, a5); }
                use a6 = Qubit() { CNOT(a0, a6); }
                use a7 = Qubit() { CNOT(a0, a7); }
            }

This circuit can reuse the same qubit for qubits a1, a2, a3, a4, a5, a6, and a7, therefore only 2 qubits are necessary: one for the a0 and one for the rest of them. The big question is, of course, if these circuits are equivalent. As written, that may be true: all qubits will have their default value of 0 and on top of that there’re no measurements to read any results. There may be similar circuits where entanglement is actually exploited and where early release of a qubit may alter the quantum state of the system. I will investigate the issue to find better examples and see how this can be fixed.

Now to the second question. Why is the width of the MTest2 circuit gets computed as 3? This is simply because the greedy width-depth algorithm only finds a reasonable depth-width, not the optimal depth-width. And in this case, it cannot find the solution with two qubits. Busy/idle time and possible reuse are computed when qubits go out of scope. When a[0] goes out of scope, it is marked as busy from the beginning of the first CNOT till the end of the last CNOT. When qubits a[1] through a[6] go out of scope, the algorithm reuses the same underlying qubit for them and also marks it as busy from the beginning of the first CNOT till the end of the last CNOT. It doesn’t track the idle time for the duration of the fifth CNOT. Then b is assigned a new qubit because there’s no known idle qubit for the duration of the fifth gate. So, the computed width becomes 3.

To illustrate the approach, the width of the following circuit will be computed as 5.

            use a = Qubit[8] {
                CNOT(a[0], a[1]);
                CNOT(a[0], a[3]);
                CNOT(a[0], a[5]);
                CNOT(a[0], a[7]);
                CNOT(a[0], a[6]);
                CNOT(a[0], a[4]);
                CNOT(a[0], a[2]);
            }

As one qubit will be counted for a[0], one for a[1] and a[2], one for a[3] and a[4], one for a[5] and a[6], and one for a[7]. There are better algorithms which we may consider in future (especially for the known interesting cases). However, we cannot expect to have the optimal algorithm as that would be prohibitively expensive computation-wise.

Thanks again for reporting the issue.

hzy-cas commented 3 years ago

Hi Vasilevsky @DmitryVasilevsky,

I found that if some measurements are added at the end, then the width will become 3. I think in this case, no qubits can be reused.

operation MTest() : Result[]
{
  mutable res = new Result[8];
  use a = Qubit[8]
  {
        H(a[0]);
        CNOT(a[0], a[1]);
        CNOT(a[0], a[2]);
        CNOT(a[0], a[3]);
        CNOT(a[0], a[4]);
        CNOT(a[0], a[5]);
        CNOT(a[0], a[6]);
        CNOT(a[0], a[7]);
        for i in 0..7
        {
         set res w/= i <- M(a[i]);
        }
        return res;         
  }    
}

Since for quantum circuit design problems, in most times, the desinger knows when qubits are idle, could we add a new feature that we can close the automatic reuse and manually decide which qubits can be reused ？

DmitryVasilevsky commented 3 years ago

Thanks for the updated example @hzy-cas ! I was thinking along these lines and figured out that such reordering and reuse is correct, even though counterintuitive. First let’s look at this new example.

Hadamard gate puts the a[0] into superposition w(|0>+|1>), and subsequent CNOTs entangle them it with the rest to make w(|0> + |255>) state, or in simple terms, they are all equal. Once you measure the a[0], the state collapses, with 50% chance being |0> and 50% chance being |255>. Subsequent measurements on qubit a[i] return the same result as the first. So you’ll get [0,0,0,0,0,0,0,0] or [1,1,1,1,1,1,1,1] in res array with 50% chance.

Now let’s look at how we can save qubits. Gates on different qubits commute, including measurements. Using this rule, we cannot reorder CNOTs because they share a[0], but we can move measurements to be performed earlier. Let’s look at the following rewrite of the code. I will use fictitious functions _allocate/_release to mark actual allocation and release points for the qubits.

   _allocate(a[0]);
H(a[0]);
   _allocate(a[1]);
CNOT(a[0], a[1]);
   _allocate(a[2]);
CNOT(a[0], a[2]); M(a[1]);
   _release(a[1]); _allocate(a[3]); // a[3] is reusing a[1]
CNOT(a[0], a[3]); M(a[2]);
   _release(a[2]); _allocate(a[4]); // a[4] is reusing a[2]
CNOT(a[0], a[4]); M(a[3]);
   _release(a[3]); _allocate(a[5]); // a[5] is reusing a[3], a[1]
CNOT(a[0], a[5]); M(a[4]);
   _release(a[4]); _allocate(a[6]); // a[6] is reusing a[4], a[2]
CNOT(a[0], a[6]); M(a[5]);
   _release(a[5]); _allocate(a[7]); // a[7] is reusing a[5], a[3], a[1]
CNOT(a[0], a[7]); M(a[6]);
   _release(a6);
M(a[0]); M(a[7]);
   _release(a0); _release(a[7]);

Now, there’re several questions to this rewrite and I will try to address them all.

Does it produce the same result? Yes, this is equivalent to the original example. We can see that everything up to M(a[1]) creates the state w(|0>+|7>). Measurement of a[1] then collapses the sate to either |0> or |7> and returns 1 with 50% chance and 0 with 50% chance. It is easy to see that subsequent measurements return the same value. So you get the same result with the same probabilities.

How many qubits are used? We use one qubit for a[0], one for a[1], a[3], a[5], a[7], and one for a[2], a[4], a[6]. So 3 qubits is the width.

Why can’t we use 2 qubits reusing the same qubit for a[1]…a[7]? Here we optimize for depth in the first place. To reuse the same qubit for a[1] and a[2] we would have to complete M(a[1]) before CNOT(a[0], a[2]). If we don’t reuse the same qubit for a[1] and a[2], M(a[1]) is assumed to be performed at the same time as CNOT(a[0], a[2]) resulting in lower depth. So the calculated width of 3 qubits instead of 2 is not an inefficiency of width-depth calculation but rather a necessity.

If you want full control of qubit reuse you can allocate them in the beginning and pass them around, That way automatic allocation and reuse will be out of the way. That isn't ideal, and we are considering control of qubit reuse in future. This is especially necessary in a parallel setting as qubit reuse may reduce parallelism.

DmitryVasilevsky commented 3 years ago

Behavior and calculations are correct. I updated the algorithm to handle the case where several qubits are released at the same time. Now the example I posted with several qubits released out of order is also reported as depth 2. One needs to qubits at the same time to benefit from this optimization. An example where one qubit is released separately will still be reported as depth 3.

microsoft / qsharp-runtime