qir-alliance / qir-runner

QIR bytecode runner to assist with QIR development and validation
https://qir-alliance.github.io/qir-runner
MIT License
23 stars 4 forks source link

Semi-Classical Operations Queue for Sparse Simulator #188

Open sam-jaques opened 1 week ago

sam-jaques commented 1 week ago

In the C++ version of the sparse simulator, semi-classical operations (i.e., those that would only permute or apply phase to computational basis states) would be queued instead of immediately executed, so that they could all be executed at once when necessary. For quantum circuits with long sections of these gates (e.g., quantum circuits for classical arithmetic), this added a substantial performance gain for simulations.

A drawback of this feature is that there could be a disconnect between the internal quantum state and the gates that a user expected to have been executed. For example, this was turned off in debug mode so that any simulator errors (like released qubits not in a zero state) would occur when a gate was requested, rather than much later.

As far as I can tell this feature was removed in the transition to Rust. Was this done for code safety reasons, or is this a feature that would be useful to have in the future?

swernli commented 1 week ago

Hi @sam-jaques! In the initial commit of this repo the gate queuing was initially skipped, but it was added shortly afterward in https://github.com/qir-alliance/qir-runner/pull/17. That queues the same gates in the same fashion as the old C++ implementation.

Is there a specific use case you are looking at that isn't working or are you seeing a difference in behavior when compared to other simulation?

sam-jaques commented 1 week ago

That looks slightly different than what I'm referring to, as far as I can tell.

For example, if a Y gate is requested, then it can commute through the queues for H, Rx, Ry, Rz, but once it commutes through, it can be queued afterwords so that if there is a sequence of requests such as Y, CNOT, Z, X, CCNOT, T, they can all be implemented at once.

It looks like in the Y gate function at line 631 https://github.com/qir-alliance/qir-runner/blob/2fac7354f4e98c3efc05f8fd4180a02d0b712907/sparsesim/src/lib.rs#L631 that when a Y gate is requested, it is immediately applied. Compare to the line in C++ simulator https://github.com/microsoft/qsharp-runtime/blob/da1872db685826861313ef078acab615e04f1334/src/Simulation/NativeSparseSimulator/SparseSimulator.h#L248, where it is added to "_queued_operations".

I was asking because of an implementation of Shor's algorithm with the new QDK backend, which takes about 10-20 seconds for a 100-bit integer, whereas the C++ version would take under a second for the same problem. It's not entirely comparable since the implementations of Shor's algorithm are different, but I thought this feature might be creating the difference.

swernli commented 1 week ago

Oh, I see what you mean. Yes, that kind of queuing is not present, and gates that do not change the size of the state vector are executed immediately. At the time, the performance benefit of this kind of queuing was not seen as significant enough to implement it, but if you are running into scenarios where you are stuck waiting because of it then it might be worth revisiting. At the very least, these kinds of transformations could be tracked so that they are applied to each state during a single iteration through the vector rather than via iterating the vector each time. This will make a bigger difference for dense states rather than sparse ones, which it sounds like you are hitting with Shor's.

swernli commented 4 days ago

@sam-jaques I've added a PR that introduces more of the queuing. In the future, this queue could also be a place we optimize more via gate commuting and elimination, but that can wait for a follow up PR.