QParallel. Support for explicit parallelism in quantum programs similar to OpenMP

vadym-kl commented 2 years ago

Is your feature request related to a problem? Please describe.

There is a crucial interaction between parallelism and memory management in quantum computing: Depending on the memory management strategy, the compiler (or the run-time system) may introduce artificial dependencies, e.g., among loop iterations that would otherwise be subject to parallelism. As an example, consider the following code snippet:

for (q in qubits) {
  Op(q);
}

Op is some quantum operation that takes a qubit as input. If Op is in the native gate set of the target architecture, for example, then this loop may be executed in parallel. However, if Op is temporarily allocates and then deallocates a helper qubit, quantum memory management may decide to reuse the same helper qubit for all iterations of the loop.

An example of this is given in above. On the right implementation of a controlled rotation gate is shown that uses one helper qubit to avoid doubling the number of single-qubit rotations, which induces a large overhead when implemented fault-tolerantly. Consequently, all potential parallelism is lost due to the introduction of this dependency. The no-cloning theorem prohibits removing this dependency through copying. Furthermore, an alternative resource management strategy that opts to never reuse resources would result in prohibitively large numbers of qubits and gates. As a solution to this problem, we propose language constructs to explicitly mark parallel regions.

Describe the solution you'd like We proposed a language extension, with the example above could be expressed as

parallel for (q in qubits) {
  Op(q);
}

where the parallel indicates that managed resources should not be reused across loop iterations. In more details, the syntax and semantics of our proposed language extension is provided below. QParallel provides support for parallel sections and loops, in addition to specific clauses that facilitate making existing code parallel.

Parallel sections

A parallel sections statement defines a portion of the program that should be run in parallel. The syntax is as follows:

parallel sections {
  SectionSequence
}

The code encapsulated in a parallel sections block contains a sequence of section statements. Each section should be run in parallel with other section code blocks, if possible.

Section statement

The section statement marks a section that can be run in parallel with other section blocks in the same parallel sections environment.

section {
  StatementBlock
}

A precondition to the usage of this statement is that each section can be executed in parallel. Consequently, different sections must not share any qubits. In particular, to ensure parallel execution, automatic memory management must not reuse temporary qubits that are allocated and deallocated inside a different section of the same parallel sections block.

Parallel for statement

Another case that commonly benefits from parallelization is a loop where each iteration of the loop can be run in parallel. To support this use case, we introduce a parallel for statement. Its behavior is identical to that of a parallel sections statement, where each iteration of a loop is considered a separate section inside of it. The parallel for statement can be used as follows:

parallel for i in iterable {
  StatementBlock
}

Fanout clause

Consider the following for-loop in Q#:

for t in targets {
  CNOT(control, t);
}

This loop cannot be executed in parallel because all CNOT operations share the same control qubit, declared outside of the loop. As a remedy, we introduce the so-called fanout clause, which allows us to rewrite the code above as a parallel for-loop with fanout:

parallel for t in targets fanout(control, 4) {
  CNOT(control, t);
}

In this example, the control qubit is fanned out to three extra qubits, resulting in a total of 4 control qubits that are shared among loop iterations. This reduces the runtime of the loop by up to 4×.

More generally, a fanout(qubits, n) clause fans out the specified qubits to n-1 temporary qubits, so that loop iterations can use one of these "entangled copies" and thus remove some or all of the data dependencies. At the end of the loop, an unfanout operation, which is the inverse of the initial fanout operation, will consolidate the effect of all iterations on their respective copies of the fanned-out qubit(s).

BNF

Backus-Naur form below describes the syntax of our proposed language extension.

Describe alternatives you've considered Alternative is to add Q# intrinsics to directly call qubit manager functionality implemented in C# and C++ without any changes to the language. It would be also nice to add documentation explaining how this intrinsic can be used and some providing examples.

C# runtime implements required functionality here:

and C++ QIR runtime implements required functionality here:

Additional context This will allow to avoid issues similar to the ones discussed in: https://github.com/microsoft/qsharp-runtime/issues/1037 https://github.com/microsoft/qsharp-runtime/issues/419 and make quantum parallelism more intuitive for developers . Above features can simplify and improve code in https://github.com/microsoft/QuantumEllipticCurves. Detailed analysis of this feature and results obtained using a prototype implementation are available at arXiv.2210.03680.

Affidavit (please fill out)

Please add ticks by placing a cross in the box:

[x] I have searched both open and closed suggestions and proposals on this site and believe this is not a duplicate.
[x] I believe that the spirit of this suggestion is aligned with the design principles and general vision for Q#.

Please tick all that apply:

[x] This is not a breaking change to the Q# language design
[x] I or my organization would be willing to help implement and/or test this

vadym-kl commented 2 years ago

Above proposal is a joint work by @msoeken, @alexva, @thomashaener. @sam-jaques, @fvirdia, @cryptosidh, this proposal might be helpful to mitigate the issues with quantum parallelism you have encountered while working on quantum cryptography resource estimates in Q#.

bettinaheim commented 2 years ago

Hi @vadym-kl. Thanks for filing this. This suggestion contains a couple of different requests; would you mind splitting this into individual issues that can be processed separately?

vadym-kl commented 2 years ago

@bettinaheim I beleive there is a conceptual discussion about Q parallel needed first and this issue could be a good place to have it. Once there is a common conceptual ground, perhaps one can create issues for 'parallel for', 'parallel sections', 'section' and 'fanout' discussing further details and referring to this issues for common high level picture.

sam-jaques commented 1 year ago

This would be great, generally. It seems like most of the issues occur in loops of various sorts, so circuit designers probably know what to tell the compiler for each loop (whether to be parallel or not). Would each "Section" use the width- or depth-minimizing compiler? It seems like the width-minimizing compiler would be the better choice, leaving the programmer to manually create the parallelism, but maybe it could be given as a parameter?

A more ambitious feature would be to allow access to the current number of allocated qubits, so the compilation/running of a circuit can dynamically decide whether it should be parallel or not. Having the change suggested in this issue would probably make this extra feature easier to implement and use at some future date.

microsoft / qsharp-language

QParallel. Support for explicit parallelism in quantum programs similar to OpenMP #162