Mitigating confusing global phase

tcNickolas commented 7 months ago

Discussion landed at a few concrete tasks to help with mitigating how confusing global phase in simulation can be:

[x] #1370
[x] #1388
[x] #1450
[x] https://github.com/qir-alliance/qir-runner/issues/162
[x] Pull in fix for Rx/Ry phase by updating simulator tag (#1461)
[x] Add equivalence check member function for StateDump in Python to make checking states easier with global phase (#1372)
[x] Ensure DumpRegister does not add phase on display (#1461)
[x] https://github.com/qir-alliance/qir-runner/pull/173
[x] Pull in fix for Rx/Ry of 2 Pi by updating simulator tag (#1551)
[x] https://github.com/qir-alliance/qir-runner/pull/186
[x] Pull in fix for rotations by 0.0 by updating simulator tag (#1641)

Describe the bug

QDK sparse simulator introduces a global phase at random that shows up in DumpMachine output (and the outputs derived from it, such as dump_operation).

This is very confusing for several ways:

Pedagogically, global phase is hard enough to explain on its own, and if it appears randomly in contexts where there's nowhere for it to come from, this is very confusing for the learner (in the katas output, in assignments, in experimentation they do on their own, and so on.)
Writing tests becomes a lot more elaborate compared, say, to Qiskit: where in Qiskit I would just write assert state_vector == pytest.approx(a) to compare the state vectors, in Q# I have to first detect the global phase difference between the expected state and the actual state and then as I loop over the elements to remember to include it in every element comparison.

To Reproduce

Several examples:

If you run our sample BellState.qs several times, sometimes the last two states have an extra -1 phase.
The test TestPreparationCompletion in our libraries prepares states with real amplitudes, but the expected DumpMachine output (starting with states that have both positive and negative) has complex amplitudes.
In the katas, phase oracles that are different by a global phase produce the same output when applied to the same state.

If I write a simple test for a state preparation operation that uses only Ry and Controlled Ry gates (no gates that might introduce any complex numbers), two of the basis states in the test (the second and the fourth) will acquire a complex phase i.

@pytest.mark.parametrize("a",
[ [1., 0., 0., 0.],
  [0., 1., 0., 0.],
  [0., 0., 1., 0.],
  [0., 0., 0., 1.]]
def test_prep_two_qubit(a):
qsharp.init(project_root='.')
qsharp.eval(f"use qs = Qubit[2]; StatePreparation.PrepTwoQubits(qs, {a});")
state = qsharp.dump_machine()
print(state)
# Do the testing

Expected behavior

Simulator not introducing a random global phase in vast majority of scenarios (for example, when there are only real-valued gates involved).

swernli commented 7 months ago

I think the subtle difference (which was present in the Legacy QDK as well when using sparse simulation) was that phase does not get cleared when using X to return to ground state, only when all qubits are released. So for example, this playground program shows that Z on a qubit in the |1⟩ state produces the expected global phase (which is important for education) but that phase persists after an X returns the qubit to the ground state, leaving the qubit in the -|0⟩ state. That seems correct, and explains the behavior you describe above, because internally during simulation there is no Reset or MResetZ just a conditionally applied X to return to |0⟩ after measurement. So what would be the right change to make here that both avoids confusion for more complex programs and exercise validation but doesn't break the educational value of showing how XZX can introduce a global phase?

tcNickolas commented 6 months ago

Mathematically, this is correct behavior. But our documentation for Reset and MResetZ promises "ensures that the qubit is returned to |0⟩", which is incorrect description of its actual effect.

In most of the scenarios I'm concerned about (and in most questions I got on this topic), the global phase shows up without any measurements being involved. (And the BellState.qs example would be better off with qubits deallocated after each state and allocated from scratch, since we're not really reusing them.)

swernli commented 6 months ago

Thanks for digging into this. There is definitely a lot going on, and it's going to be tricky finding the right balance between "mathematically correct but confusing" on the one hand, and "more predictable but less rigorous" on the other. I tried finding a minimal repro of sorts to show the limitations of how simulation treats global phase, and I think I have a good one to foster continued discussion. You make a good point that focusing on reset and making sure it actually puts the qubit back to the |0⟩ state, so I've starting thinking how we might do that consistently. But even when there is no reset or measurement we can still run into problems. Here's a sample program:

namespace MyQuantumProgram {

    open Microsoft.Quantum.Diagnostics;
    @EntryPoint()
    operation Main() : Unit {
        {
            use outer = Qubit();
            Message("outer before");
            DumpMachine();

            {
                use inner = Qubit();
                Message("outer + inner");
                DumpMachine();

                X(inner);
                Z(inner);
                X(inner);
                Message("outer + inner_phase");
                DumpMachine();
            }

            Message("outer after");
            DumpMachine();
        }

        Message("all done");
        DumpMachine();
    }
}

When you run that, you'll see an example of odd phase behavior that seems mathematically incorrect:

The qubit that had Z applied gets a phase as expected, but when it goes out of scope the phase "jumps" to the other qubit, even without any entanglement, superposition, measurement or reset in the picture. That's because the simulator just tracks the "0" state (with some padding for number of qubits) but doesn't know which parts of the phase in that state are associated with which qubits. At least, the update to the sparse simulator in https://github.com/qir-alliance/qir-runner/pull/66 causes it clean up this phase once all qubits go out of scope. Compare this to the classic QDK which had the same quirk but without the better result on the "all done" state (note: I had to use the older use outer = Qubit() { ... } syntax to get the explicitly scoped qubits):

outer before
|0⟩      1.000000 +  0.000000 i  ==     ******************** [ 1.000000 ]     --- [  0.00000 rad ]
|1⟩      0.000000 +  0.000000 i  ==                          [ 0.000000 ]                   
outer + inner
|0⟩      1.000000 +  0.000000 i  ==     ******************** [ 1.000000 ]     --- [  0.00000 rad ]
|1⟩      0.000000 +  0.000000 i  ==                          [ 0.000000 ]                   
|2⟩      0.000000 +  0.000000 i  ==                          [ 0.000000 ]                   
|3⟩      0.000000 +  0.000000 i  ==                          [ 0.000000 ]                   
outer + inner_phase
|0⟩     -1.000000 +  0.000000 i  ==     ******************** [ 1.000000 ] ---     [  3.14159 rad ]
|1⟩      0.000000 +  0.000000 i  ==                          [ 0.000000 ]                   
|2⟩      0.000000 +  0.000000 i  ==                          [ 0.000000 ]                   
|3⟩      0.000000 +  0.000000 i  ==                          [ 0.000000 ]                   
outer after
|0⟩     -1.000000 +  0.000000 i  ==     ******************** [ 1.000000 ] ---     [  3.14159 rad ]
|1⟩      0.000000 +  0.000000 i  ==                          [ 0.000000 ]                   
all done
|0⟩     -1.000000 +  0.000000 i  ==     ******************** [ 1.000000 ] ---     [  3.14159 rad ]

Even with the old full state simulator, the phase still "jumps" and worse persists even across all qubits being released.

So what would be the desirable behavior for a sample like this? Even if the qubit was explicitly reset before going out of scope, the sparse state simulator will only know that qubit 1 is being reset/released and the current state vector is a single entry with index 0 and amplitude -1.0. How can it know which qubit to associate that negative phase to and when to drop it? Is it possible we need some record of when phase gets applied and which qubit it is applied to?

tcNickolas commented 6 months ago

In this particular example I'm more bothered by DumpMachine outputting |0> when there are no qubits allocated...

I don't think we can say the phase is applied to a specific qubit - for example, we can apply a global phase -1 to a |11> state using Controlled Z to find ourselves a phase associated with two qubits, and multi-controlled Z to apply it to as many |1> qubits as we want. In this scenario, unfortunately mathematically it makes sense to have the global phase stick around.

What I'd really like to do is to track down the root causes of phases appearing in the examples I gave, and figure out how to make those behave. I don't think I ever noticed running into issues with a global phase persisting when measuring or deallocating qubits - our BellState example is the first time this happened, and the second example is the kind of code I've never written. But the issues that occur without measurements/deallocation look very confusing.

[ ] I think we should update the BellState sample to allocate qubits inside the loop instead of resetting them on each iteration, and remove ResetAll from individual Prepare... operations. State preparation routines usually assume the qubits already start in |0> state (for example, our PreparePureStateD) to make sure they have adjoints, so it would be consistent with that. And I checked that the -1 phase doesn't show up with that fix.
[ ] We should probably take a look at our other samples and see if anything suspicious phase-wise shows up.
[ ] The next example is phase oracles. The simplest one is Deutsch kata (in the playground only), exercise "Oracle for f(x) = 1 - x". There are two correct solutions for phase flip on |0>, XZX and ZR(PauliI). The output produced by ShowQuantumStateComparison is:
- Initial state 0.6|0〉 + 0.8|1〉 - correct
- The state after applying reference solution Z(x); followed by R(PauliI, 2.0 * PI(), x); is 0.6|0〉 - 0.8|1〉 - incorrect, it should be -0.6|0〉 + 0.8|1〉. There are no measurements or deallocations. I know the simulator does not simply ignore R gate because the solution passes and it doesn't for Z gate alone.
- If I use reference solution XZX instead, the output is correct, -0.6|0〉 + 0.8|1〉
- What is the difference between these two scenarios?

tcNickolas commented 6 months ago

Is it that R(PauliI) is ignored unless it's controlled?

swernli commented 6 months ago

I like this list of tasks. Would you be ok with updating the description to add those tasks for tracking and update the title to something like, "Mitigating confusing global phase" instead? Then we can take them one at a time or promote them into sub-issues if appropriate. I'm happy to do the edits if you prefer (I'll preserve the original description at the bottom).

To that list, I'd add one other task:

[ ] Add equivalence check member function for StateDump in Python to make checking states easier with global phase.

I think this would help with the authoring of Python test cases, and it can explain in that it's not mathematical equivalence but state equivalence that ignores the global phase.

Is it that R(PauliI) is ignored unless it's controlled?

Yes, R is implemented as: https://github.com/microsoft/qsharp/blob/d88711ee1cfc0b043edc1d98c9366ce9087112f7/library/std/intrinsic.qs#L397-L408 And that utility ApplyGlobalPhase only operates on the control qubits and is no-op otherwise: https://github.com/microsoft/qsharp/blob/d88711ee1cfc0b043edc1d98c9366ce9087112f7/library/std/internal.qs#L30-L41

tcNickolas commented 6 months ago

Sure, go ahead with the edits and/or splitting into separate subtasks.

I think we need to change that behavior to have R(PauliI) act even if it's not controlled. It's super confusing when talking about phase oracles to have different oracles with the same matrix when they should really have different matrices (and to have different ways to get the same global phase yield different results), I got so many questions about that.

For the state equivalence check, are you thinking about comparing two dictionaries (as used by our StateDump)? That would not help my scenario, since in it my argument is a plain list of amplitudes... Maybe one more util to convert list of amplitudes into a StateDump type? Could put in into qsharp_utils, same as dump_operation

swernli commented 6 months ago

For the state equivalence check, are you thinking about comparing two dictionaries (as used by our StateDump)? That would not help my scenario, since in it my argument is a plain list of amplitudes... Maybe one more util to convert list of amplitudes into a StateDump type? Could put in into qsharp_utils, same as dump_operation

I was actually thinking something like this (easier to code it up than try to describe it): https://github.com/microsoft/qsharp/blob/dd6b8940b7dd83b8169c852774c843a1e4b67c75/pip/tests/test_qsharp.py#L101-L112 It takes a dictionary and performs the check by ensuring the states are equivalent by ensuring the same non-zero indices are present and share the same phase.

tcNickolas commented 6 months ago

The next issue I notice is with R1 gate: it's supposed to introduce global phase only for |1⟩, but a global phase creeps in.

    use qs = Qubit[2];
    H(qs[0]);
    R1(PI(), qs[0]);

is supposed to convert the state to |-⟩|0⟩ (R1(PI()) is effectively the Z gate), but there's a relative phase involving i there. This looks like a side effect of ignoring R(PauliI), if we decompose R1 into R(PauliZ) followed by R(PauliI), but the controlled version of R1 also introduces a weird global phase:

    use qs = Qubit[2];
    H(qs[0]);
    H(qs[1]);
    Controlled R1([qs[0]], (PI(), qs[1]));

swernli commented 6 months ago

I think you are on to something critical with the difference in behavior vs expectation for R(PauliI). If I look back at the old qsharp-runtime repo, I can see the C++ full-state simulator would add a global phase via the G gate for R(PauliI), even when not controlled. However, our decompositions for hardware (refactored into the new std lib as quoted above) and the classic C++ sparse-state simulator both treat R(PauliI) without controls as a no-op (a little farther down in the file, the multi-controlled R uses the same logic that is in the decomposition of only operating on the control qubits for PauliI). Just to drive the point home, I tried the following code in classic QDK full-state, classic sparse-state, and modern sparse-state simulators:

        use q = Qubit();
        R(PauliI, PI(), q);
        DumpMachine();
        Reset(q);

The classic full-state simulator applies a global phase of -𝑖:

|0⟩      0.000000 + -1.000000 i  ==     ******************** [ 1.000000 ]    ↓    [ -1.57080 rad ]
|1⟩      0.000000 +  0.000000 i  ==                          [ 0.000000 ]

while both classic and modern sparse-state simulator resulted in a no-op:

|0⟩      1.000000 +  0.000000 i  ==     ******************** [ 1.000000 ]     --- [  0.00000 rad ]

This use of G to apply global phase was possible before because R was an intrinsic gate and the internals of the simulator could decide how to apply the rotation based on the Pauli basis passed in, whereas the modern approach of more closely mimicking hardware means we rely on combinations of the Rx, Ry, and Rz intrinsic gates to make up the R gate. It seems odd to me to have R(PauliI) effectively behave as R(PauliZ) with the angle divided by 2, but I must confess I'm not sure where to draw the line in this case between what is simulated and what is compiled for execution on hardware. Would the mathematically correct (and less confusing) application of R(PauliI) be something that occurs only in simulation and gets dropped when running on hardware?

Between the decompositions for hardware and the explicit behavior of the sparse-state simulator, it seems like someone made an intentional choice to change the global phase behavior of R(PauliI), but I don't think we have enough information to understand why, which makes it hard to justify relative to the confusion that approach to global phase causes.

tcNickolas commented 6 months ago

I'm still not sure what happens with the Controlled R1, since controlled variants of R1 gate should not introduce a global phase...

I think we need to dig deeper into this intentional choice - I'm not convinced that there is any benefit in this behavior, and there is definitely a lot of confusion introduced by it. If I cannot even use Controlled variants of a gate to get rid of the global phase, I'm not sure I trust the decompositions to be accurate, so I won't get to running the code on hardware.

swernli commented 6 months ago

Ok, I think I may finally understand where the oddities of phase are coming from. It seems that across the three simulators we used (classic full-state, classic sparse-state, and modern sparse-state), we made slightly different shortcuts for execution that create different behavior. Each of these different behaviors could be defended under the banner of "states that differ only by a global phase represent the same physical system," but they do not help with educating on the mathematical behaviors of the gates. I'm going to try to put it all together in this comment, which will get long...

First, the interplay between Rz, Ri, and R1. In the classic full-state simulator, these were each distinct operations, with Ri applying a global phase (aka the G or Ph gate). You can see this by checking the behavior of each gate when applied to the |+⟩ state with a parameter of PI():

Rz:
|0⟩      0.000000 + -0.707107 i  ==     ***********          [ 0.500000 ]    ↓    [ -1.57080 rad ]
|1⟩      0.000000 +  0.707107 i  ==     ***********          [ 0.500000 ]    ↑    [  1.57080 rad ]

Ri:
|0⟩      0.000000 + -0.707107 i  ==     ***********          [ 0.500000 ]    ↓    [ -1.57080 rad ]
|1⟩      0.000000 + -0.707107 i  ==     ***********          [ 0.500000 ]    ↓    [ -1.57080 rad ]

R1:
|0⟩      0.707107 +  0.000000 i  ==     ***********          [ 0.500000 ]     --- [  0.00000 rad ]
|1⟩     -0.707107 +  0.000000 i  ==     ***********          [ 0.500000 ] ---     [  3.14159 rad ]

For both the classic sparse-state and modern sparse-state simulator, the code treats Ri as a no-op, essentially dropping the phase of exp(i * theta). Since (Rz(theta))(Ri(theta / 2)) = R1(theta), this means each simulator treating Ri as a no-op must decide how to treat the resulting equality of Rz(theta) = R1(theta). Confusingly, they each do it in different ways. The classic sparse-state simulator does this by implementing R1 as a phase shift and having Rz just perform an R1. Checking that by having the classic sparse-state operate on |+⟩ with a parameter of PI(), we see:

Rz:
|0⟩      0.707107 +  0.000000 i  ==     **********           [ 0.500000 ]     --- [  0.00000 rad ]
|1⟩     -0.707107 +  0.000000 i  ==     **********           [ 0.500000 ] ---     [  3.14159 rad ]

Ri:
|0⟩      0.707107 +  0.000000 i  ==     **********           [ 0.500000 ]     --- [  0.00000 rad ]
|1⟩      0.707107 +  0.000000 i  ==     **********           [ 0.500000 ]     --- [  0.00000 rad ]

R1:
|0⟩      0.707107 +  0.000000 i  ==     **********           [ 0.500000 ]     --- [  0.00000 rad ]
|1⟩     -0.707107 +  0.000000 i  ==     **********           [ 0.500000 ] ---     [  3.14159 rad ]

But the modern sparse-state simulator (plus the stdlib decompositions in the modern QDK) goes the other way, and only defines Rz, letting R1 just be an application of Rz. So taking the same application to the |+⟩ state with a parameter of PI(), the modern sparse-state simulator gives:

Rz:
 Basis | Amplitude      | Probability | Phase
 -----------------------------------------------
   |0⟩ |  0.0000−0.7071𝑖 |    50.0000% |  -1.5708
   |1⟩ |  0.0000+0.7071𝑖 |    50.0000% |   1.5708

R(PauliI):
 Basis | Amplitude      | Probability | Phase
 -----------------------------------------------
   |0⟩ |  0.7071+0.0000𝑖 |    50.0000% |   0.0000
   |1⟩ |  0.7071+0.0000𝑖 |    50.0000% |   0.0000

R1:
 Basis | Amplitude      | Probability | Phase
 -----------------------------------------------
   |0⟩ |  0.0000−0.7071𝑖 |    50.0000% |  -1.5708
   |1⟩ |  0.0000+0.7071𝑖 |    50.0000% |   1.5708

Arguably, the classic full-state simulator is mathematically correct, as each gate has distinct behavior that corresponds to the matrix traditionally used to represent that gate. Both the sparse-state simulators cheat by having Ri = I, losing expected phase, and the classic sparse-state uses R1 for Rz (so Rz is missing the expected phase) while modern sparse-state use Rz for R1 (making R1 have an unexpected phase). To better match expectation, it seems like the right fix is what I'd worried it would be: introduce a global phase intrinsic that is used in simulation and treated as a no-op when compiling for hardware.

As an added bonus quirk of global phase, we expect Rx(PI()) = -iX and Ry(PI()) = -iY. This is true for the classic full-state simulator, where again operation on |+⟩ shows the expected phase difference:

X:
|0⟩      0.707107 +  0.000000 i  ==     ***********          [ 0.500000 ]     --- [  0.00000 rad ]
|1⟩      0.707107 +  0.000000 i  ==     ***********          [ 0.500000 ]     --- [  0.00000 rad ]

Rx:
|0⟩      0.000000 + -0.707107 i  ==     ***********          [ 0.500000 ]    ↓    [ -1.57080 rad ]
|1⟩      0.000000 + -0.707107 i  ==     ***********          [ 0.500000 ]    ↓    [ -1.57080 rad ]

But for both classic and modern sparse-state simulation, the Rx/Ry gates have an optimization that detects when the rotation is effectively a Pauli rotation and hands off to the corresponding X or Y implementation, causing rotations by PI() to lose their phase (see this code for the logic in the modern sparse-state simulator). So trying the same trick of operating on |+⟩, we can see the unexpected equality and missing global phase:

X:
 Basis | Amplitude      | Probability | Phase
 -----------------------------------------------
   |0⟩ |  0.7071+0.0000𝑖 |    50.0000% |   0.0000
   |1⟩ |  0.7071+0.0000𝑖 |    50.0000% |   0.0000

Rx:
 Basis | Amplitude      | Probability | Phase
 -----------------------------------------------
   |0⟩ |  0.7071+0.0000𝑖 |    50.0000% |   0.0000
   |1⟩ |  0.7071+0.0000𝑖 |    50.0000% |   0.0000

I think this can also be solved via application of the global phase gate, where the shortcut linked above applies the phase of -i manually after the application of X or Y.

Given all of that, I would propose two work items:

The task on this issue for revising R behavior for PauliI be expanded into a linked issue of it's own that covers the introduction of a global phase intrinsic along with related decomposition updates and reviewing behavior of controlled specializations
An additional linked issue be added that covers the update of the shortcut logic for Rx and Ry in simulation so that expected global phase is included rather than skipped

Does this sound reasonable?

tcNickolas commented 6 months ago

Sounds convincing :-) We'll need to include the examples I had in this thread in the tests to make sure they are processed correctly, and I'll report back if I find any additional odd cases

tcNickolas commented 6 months ago

In the teleportation sample (https://github.com/microsoft/qsharp/pull/1456) global phase shows up on the target qubit for teleporting |->, but bringing qubit allocation within the loop doesn't really help, since it occurs within one iteration as a result of teleportation measurement.

swernli commented 6 months ago

In the teleportation sample (#1456) global phase shows up on the target qubit for teleporting |->, but bringing qubit allocation within the loop doesn't really help, since it occurs within one iteration as a result of teleportation measurement.

Yup, I think we found another bug (or at the very least, confusing behavior) specifically for DumpRegister (see https://github.com/microsoft/qsharp/pull/1456#discussion_r1586588243). I'll add a task above for that.

swernli commented 5 months ago

Turns out there was one more issue in the simulator that affected rotations by 2 Pi. I put up a fix for this in the simulator which we'll need to pull into qsharp by updating the tag.

@tcNickolas thanks for pointing this out!

tcNickolas commented 4 months ago

I'm trying out release 1.6.0, and I think now $R_y$ is now $-I$ instead of $I$...

        use q = Qubit();
        Ry(0., q);
        DumpMachine();

gives me state $-\ket{0}$, and it should be just $\ket{0}$.

Do we have a set of corner case tests to check these behaviors automatically? I have a set of tests in https://github.com/tcNickolas/quantum-programming-in-depth/blob/main/3_unitary_implementation/qsharp-v1/03_test_unitary/test_one_qubit.py that I'm now trying to update to remove assert_matrices_equal, to just compare two matrices directly, and it keeps showing me discrepancies in the global phase handling.

swernli commented 4 months ago

Oh boy... this issue was introduced by https://github.com/qir-alliance/qir-runner/pull/173, which was trying to fix rotations by 2 Pi. The change there always adds a phase of -1 if sin(theta / 2.0) is zero, but that doesn't differentiate between even and odd rotations by Pi. I need to add a check of the cos(theta / 2.0) value, already calculated as m00, to get it to correctly add or skip the phase.

swernli commented 3 months ago

We've gotten confirmation that the phase issues we know of have all been resolved, so closing this tracking issue. Please file fresh issues if new problems with phase crop up!

microsoft / qsharp

Mitigating confusing global phase #1317