Gate order optimization for multi-GPU circuits

stavros11 commented 4 years ago

The main bottleneck of the multi-GPU implementation is swapping the global qubits. In some circuits the number of swaps required can be reduced by rearranging commuting gates. It would be useful to implement an algorithm that does this automatically to the extend that this is possible. I don't know if there is a general algorithm that can find the optimal gate arrangement for any circuit.

joseignaciolatorre commented 4 years ago

We did an optimization reducing SWAP gates centuries ago with Román Orús and I think Artur. The idea was to check not one operation but two at a time and see where each qubit should go after the first gate to the second one. The optimal was to SWAP qubits to a position which then would minimize the number of the subsequent SWAPS. After optimization we only needed 1/3 of the naive counting of SWAPS.

But those algorithms were using local gates. Does this help?

cheers ji

On Fri, Jun 12, 2020 at 10:36 PM Stavros Efthymiou notifications@github.com wrote:

The main bottleneck of the multi-GPU implementation is swapping the global qubits. In some circuits the number of swaps required can be reduced by rearranging commuting gates. It would be useful to implement an algorithm that does this automatically to the extend that this is possible. I don't know if there is a general algorithm that can find the optimal gate arrangement for any circuit.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

stavros11 commented 4 years ago

Thank you for the comment. Is there any reference that describes this idea in more detail? I searched for old papers with Román but could not find anything related. It is very likely that this can help because the multi-GPU problem is very similar to the quantum hardware case where people use SWAPs due to nearest neighbor constraints, etc.

In distributed implementations (like the multi-GPU) we use some qubits (the so-called "global") to split the state vector in pieces for each device, with number of global qubits = log2(number of devices). Applying gates to these qubits requires communication between devices. In my implementation I change the global qubits with local ones to apply gates to them. This should be equivalent with introducing SWAP gates in the circuit. The goal is to select the global qubits and rearrange the gates in such a way so that the least SWAPs are required.

Variational circuits are among worst cases because they have gates in all qubits so one SWAP per layer is certainly required. Other circuits can be optimized. For example QFT with one global qubit can be done with one swap only.

joseignaciolatorre commented 4 years ago

The SWAP trick is sort of explained in https://arxiv.org/pdf/quant-ph/0503174.pdf somewhere below eq 16.

The problem, as you present it, worsens for non-local gates. The idea then should be to find out the gates to be involved and put them together. It pays to gather several layer of operations to have a better minimum of SWAPS.

In a way, this is a part of a quantum compiler.

Cheers ji

On Sat, Jun 13, 2020 at 8:39 PM Stavros Efthymiou notifications@github.com wrote:

Thank you for the comment. Is there any reference that describes this idea in more detail? I searched for old papers with Román but could not find anything related. It is very likely that this can help because the multi-GPU problem is very similar to the quantum hardware case where people use SWAPs due to nearest neighbor constraints, etc.

In distributed implementations (like the multi-GPU) we use some qubits (the so-called "global") to split the state vector in pieces for each device, with number of global qubits = log2(number of devices). Applying gates to these qubits requires communication between devices. In my implementation I change the global qubits with local ones to apply gates to them. This should be equivalent with introducing SWAP gates in the circuit. The goal is to select the global qubits and rearrange the gates in such a way so that the least SWAPs are required.

Variational circuits are among worst cases because they have gates in all qubits so one SWAP per layer is certainly required. Other circuits can be optimized. For example QFT with one global qubit can be done with one swap only.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Quantum-TII/qibo/issues/117#issuecomment-643661482, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIRGEICYV6QU2FO4A74OHR3RWPBUZANCNFSM4N4RYSZA .

qiboteam / qibo

Gate order optimization for multi-GPU circuits #117