qiskit-community / qiskit-algorithms

A library of quantum algorithms for Qiskit.
https://qiskit-community.github.io/qiskit-algorithms/
Apache License 2.0
116 stars 59 forks source link

ComputeUncompute State Fidelity threading issue #91

Closed woodsp-ibm closed 1 year ago

woodsp-ibm commented 1 year ago

Description

Recently this job failed, as below, in similar manner to that raised by

Looking into this further its not the test case as such rather the ComputeUncompute implementation and the job thread logic it uses, more detail below.

test.state_fidelities.test_compute_uncompute.TestComputeUncompute.test_symmetry
-------------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):

      File "/home/runner/work/qiskit-algorithms/qiskit-algorithms/qiskit_algorithms/state_fidelities/compute_uncompute.py", line 161, in _run
    result = job.result()

      File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/qiskit/primitives/primitive_job.py", line 55, in result
    return self._future.result()

      File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()

      File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception

      File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)

      File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/qiskit/primitives/sampler.py", line 88, in _call
    if len(value) != len(self._parameters[i]):

    IndexError: list index out of range

The above exception was the direct cause of the following exception:

    Traceback (most recent call last):

      File "/home/runner/work/qiskit-algorithms/qiskit-algorithms/test/state_fidelities/test_compute_uncompute.py", line 106, in test_symmetry
    results_2 = job_2.result()

      File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/qiskit/primitives/primitive_job.py", line 55, in result
    return self._future.result()

      File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()

      File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception

      File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)

      File "/home/runner/work/qiskit-algorithms/qiskit-algorithms/qiskit_algorithms/state_fidelities/compute_uncompute.py", line 163, in _run
    raise AlgorithmError("Sampler job failed!") from exc

    qiskit_algorithms.exceptions.AlgorithmError: 'Sampler job failed!'

Reproducing the error

Running this script, which replicates the logic the failing test, fairly quickly fails the same way as show above.

import numpy as np

from qiskit.circuit import QuantumCircuit, ParameterVector
from qiskit.primitives import Sampler

from qiskit_algorithms.state_fidelities import ComputeUncompute

def main():
    i = 1
    while True:
        do_fidelity()
        print(f"{i}\r", end="")
        i+=1

def do_fidelity(serial=False):

    parameters = ParameterVector("x", 2)

    circuit = QuantumCircuit(2)
    circuit.rx(parameters[0], 0)
    circuit.rx(parameters[1], 1)

    sampler = Sampler()
    params_l = np.array([[0, 0], [np.pi / 2, 0], [0, np.pi / 2], [np.pi, np.pi]])
    params_r = np.array([[0, 0], [0, 0], [np.pi / 2, 0], [0, 0]])

    fidelity = ComputeUncompute(sampler)
    n = len(params_l)
    job_1 = fidelity.run(
        [circuit] * n, [circuit] * n, params_l, params_r
    )
    job_2 = fidelity.run(
        [circuit] * n, [circuit] * n, params_r, params_l
    )
    results_1 = job_1.result()
    results_2 = job_2.result()

    if not np.allclose(results_1.fidelities, results_2.fidelities, atol=1e-16):
        print("Failed")
        sys.exit()

if __name__ == "__main__":
    main()

Analysis

From investigation it appears that failure is that the sampler.run() is part of the logic called by the function that is passed job thread producing the result. That sampler code updates a number of instance variables and is not safe for such usage. I verified things by acquiring and releasing a threading lock in the fidelity primitive around the run call and the above has run over a weekend with no failure. That will not do as a general solution though.

Suggestion

My suggestion would be, if we want the same async behavior on the result/job, to change what is done in the thread to just be the call to the get the sampler job result, from running the sampler (which should happen in the main thread instead) and format up the result. It will be far less in the thread but that should be safe.