qiboteam / qibojit-benchmarks

Benchmark code for qibojit performance accessment
Apache License 2.0
2 stars 3 forks source link

Add HybridQ #25

Closed stavros11 closed 2 years ago

stavros11 commented 2 years ago

Adds the HybridQ library for CPU and GPU. Currently I observe the following issues:

mlazzarin commented 2 years ago

TFQ tests are failing in CI if HybridQ is installed. I suspect that HybridQ installs a different version of tensorflow.

If we can't solve this problem, we may opt to simply remove TFQ.

stavros11 commented 2 years ago

This is almost ready, @mlazzarin, @andrea-pasquale you can have a look when you have time. The issues with CI were false alarm as Python 3.9 seems to work okay and there is no confusion with tfq as HybridQ does not depend on tensorflow. This adds two backends: hybridq and hybridq-gpu. The first uses custom operators on CPU and the second jax einsum on GPU. If I install the library from pip I get warnings that it was not compiled properly and falls back to einsum on CPU. The only way I found to fix this was installing through conda.

An issue that I still have (other than the parallel mentioned above) is that two tests fail on GPU (with the hybridq-gpu backend). After some experiments I managed to reproduce this issue with the following script:

from hybridq.circuit import Circuit
from hybridq.gate import Gate
from hybridq.circuit.simulation import simulate

circuit = Circuit()
circuit.append(Gate('RX', params=[0], qubits=(0,)))
circuit.append(Gate('RX', params=[0], qubits=(1,)))
circuit.append(Gate('RX', params=[0], qubits=(2,)))
circuit.append(Gate('H', qubits=(0,)))
circuit.append(Gate('H', qubits=(1,)))
circuit.append(Gate('H', qubits=(2,)))
circuit.append(Gate('CZ', qubits=(1, 2)))
circuit.append(Gate('RZ', params=[0], qubits=(0,)))
circuit.append(Gate('CZ', qubits=(0, 1)))
circuit.append(Gate('RX', params=[0], qubits=(0,)))
circuit.append(Gate('CZ', qubits=(1, 2)))
circuit.append(Gate('CZ', qubits=(1, 2)))
circuit.append(Gate('H', qubits=(1,)))
circuit.append(Gate('H', qubits=(2,)))

initial_state = len(circuit.all_qubits()) * '0'
simulate(circuit, optimize="evolution-einsum", backend="jax",
         initial_state=initial_state, compress=0, simplify=False)

which gives me the following error

TypeError: einsum() got an unexpected keyword argument 'dtype'

If you could confirm that you also get this it would be great, however I will probably open an issue on hybridq about this as it does not seem to be related to the benchmarks. This is the minimal example I could find that reproduces this, if I remove any gate or any simulation option it works okay!

mlazzarin commented 2 years ago

I can reproduce it.

andrea-pasquale commented 2 years ago

I can also reproduce the issue. I also experienced the fall back on CPU after the installation with pip and also after the installation with conda. It seems that sometimes jax fails to recognize the GPU, as explained here.

I managed to solve this issue, for both the conda and the pip installation, using the following command:

pip install --upgrade jax jaxlib==0.1.75+cuda11.cudnn805 -f https://storage.googleapis.com/jax-releases/jax_releases.html
mlazzarin commented 2 years ago

Concerning the parallel flag, I think that it is not meant to be used with state vector simulation. In the section IV.B of the HybridQ paper https://arxiv.org/pdf/2111.06868.pdf (the one related to state vector simulation) there are no references to such flag, but it's written that:

Multi-threaded optimization is supported for optimize=’evolution-hybridq’ only via OpenMP, and the number of used threads can be tuned by modifying the environment variable OMP_NUM_THREADS (if not set, all cores are used by default).

On the other hand, there is an explicit reference to the parallel flag in the section regarding tensor networks:

Depending on the backend, multi-thread optimization is supported via either OpenMP or MKL, and the number of used threads can be tuned by modifying the environment variable OMP_NUM_THREADS and MKL_NUM_THREADS respectively. Finding the optimal contraction scheme can be also parallelized, which is activated by using parallel=True (False by default).

and there is also a reference in the section regarding Clifford expansion (IV.D).

In fact, it seems like there are no reference to the parallel flag in the simulate_evolution method of HybridQ: https://github.com/nasa/hybridq/blob/c7c2c85c523d46f5de3ae2e9c56ac2a2353b9611/hybridq/circuit/simulation/simulation.py#L379-L789

stavros11 commented 2 years ago

Thank you both for the input.

I managed to solve this issue, for both the conda and the pip installation, using the following command:

pip install --upgrade jax jaxlib==0.1.75+cuda11.cudnn805 -f https://storage.googleapis.com/jax-releases/jax_releases.html

I had the same issue with GPU and that's exactly how I solved it. As for the other issue with einsum I opened nasa/hybridq#83 but there is no response yet.

Concerning the parallel flag, I think that it is not meant to be used with state vector simulation. In the section IV.B of the HybridQ paper https://arxiv.org/pdf/2111.06868.pdf (the one related to state vector simulation) there are no references to such flag, but it's written that:

Multi-threaded optimization is supported for optimize=’evolution-hybridq’ only via OpenMP, and the number of used threads can be tuned by modifying the environment variable OMP_NUM_THREADS (if not set, all cores are used by default).

Indeed, that's correct, thank you for finding this. I tried OMP_NUM_THREADS and it seems to work as expected. Since the parallel option is not relevant for us as we do state vector simulation, I will remove it. The only issue is that I am not sure how to use OMP_NUM_THREADS internally in the script so that it is set from from the nthreads option. If I use os.environ inside the backend initialization it does not seem to have effect. Perhaps it is only possible to pass it as environment variable so we should just use this directly in the benchmark.

stavros11 commented 2 years ago

As for the other issue with einsum I opened nasa/hybridq#83 but there is no response yet.

There is a PR in hybridq (nasa/hybridq#85) which fixes this issue. I confirmed that our GPU tests now pass if I install hybridq from source using this branch. The only issue is that, as I wrote above, for me only the conda version installs their CPU custom operators properly so I guess we would have to wait for them to merge the new PR and release to get an optimal version for both CPU and GPU. Otherwise we should use the conda version for CPU (and non-problematic GPU configurations) and install from source if we want to run the supremacy circuit on GPU.

Other than that, I believe this PR is complete, right?