qiskit-community / qiskit-machine-learning

Quantum Machine Learning
https://qiskit-community.github.io/qiskit-machine-learning/
Apache License 2.0
695 stars 327 forks source link

Training with sampler_qnn.py breaks for a single error calling the sampler primitive #566

Closed gines-carrascal closed 1 year ago

gines-carrascal commented 1 year ago

Environment

What is happening?

When you call the "fit" function for a QVC, there are lots of call to the Sampler primitive to complete the process. This can take hours. During this process, if only one call to the Sampler fails (i.e. gets a timeout), all the process stops and you lost all the work done.

Example of trace:


The above exception was the direct cause of the following exception:

QiskitMachineLearningError                Traceback (most recent call last)
Cell In[37], line 13
     10 objective_func_vals = []
     12 start = time.time()
---> 13 vqc_r.fit(train_features, train_labels)
     14 elapsed = time.time() - start
     16 print(f"Training time: {round(elapsed)} seconds")

File /opt/conda/lib/python3.10/site-packages/qiskit_machine_learning/algorithms/trainable_model.py:201, in TrainableModel.fit(self, X, y)
    198 if not self._warm_start:
    199     self._fit_result = None
--> 201 self._fit_result = self._fit_internal(X, y)
    202 return self

File /opt/conda/lib/python3.10/site-packages/qiskit_machine_learning/algorithms/classifiers/vqc.py:190, in VQC._fit_internal(self, X, y)
    187 if isinstance(self._neural_network, (CircuitQNN, SamplerQNN)):
    188     self._neural_network.set_interpret(self._get_interpret(num_classes), num_classes)
--> 190 return super()._minimize(X, y)

File /opt/conda/lib/python3.10/site-packages/qiskit_machine_learning/algorithms/classifiers/neural_network_classifier.py:127, in NeuralNetworkClassifier._minimize(self, X, y)
    123         function = MultiClassObjectiveFunction(X, y, self._neural_network, self._loss)
    125 objective = self._get_objective(function)
--> 127 return self._optimizer.minimize(
    128     fun=objective,
    129     x0=self._choose_initial_point(),
    130     jac=function.gradient,
    131 )

File /opt/conda/lib/python3.10/site-packages/qiskit/algorithms/optimizers/scipy_optimizer.py:148, in SciPyOptimizer.minimize(self, fun, x0, jac, bounds)
    145     swapped_deprecated_args = True
    146     self._options["maxfun"] = self._options.pop("maxiter")
--> 148 raw_result = minimize(
    149     fun=fun,
    150     x0=x0,
    151     method=self._method,
    152     jac=jac,
    153     bounds=bounds,
    154     options=self._options,
    155     **self._kwargs,
    156 )
    157 if swapped_deprecated_args:
    158     self._options["maxiter"] = self._options.pop("maxfun")

File /opt/conda/lib/python3.10/site-packages/scipy/optimize/_minimize.py:702, in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
    699     res = _minimize_tnc(fun, x0, args, jac, bounds, callback=callback,
    700                         **options)
    701 elif meth == 'cobyla':
--> 702     res = _minimize_cobyla(fun, x0, args, constraints, callback=callback,
    703                             **options)
    704 elif meth == 'slsqp':
    705     res = _minimize_slsqp(fun, x0, args, jac, bounds,
    706                           constraints, callback=callback, **options)

File /opt/conda/lib/python3.10/site-packages/scipy/optimize/_cobyla_py.py:34, in synchronized.<locals>.wrapper(*args, **kwargs)
     31 @functools.wraps(func)
     32 def wrapper(*args, **kwargs):
     33     with _module_lock:
---> 34         return func(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/scipy/optimize/_cobyla_py.py:270, in _minimize_cobyla(fun, x0, args, constraints, rhobeg, tol, maxiter, disp, catol, callback, **unknown_options)
    267         callback(np.copy(x))
    269 info = np.zeros(4, np.float64)
--> 270 xopt, info = cobyla.minimize(calcfc, m=m, x=np.copy(x0), rhobeg=rhobeg,
    271                               rhoend=rhoend, iprint=iprint, maxfun=maxfun,
    272                               dinfo=info, callback=wrapped_callback)
    274 if info[3] > catol:
    275     # Check constraint violation
    276     info[0] = 4

File /opt/conda/lib/python3.10/site-packages/scipy/optimize/_cobyla_py.py:258, in _minimize_cobyla.<locals>.calcfc(x, con)
    257 def calcfc(x, con):
--> 258     f = fun(np.copy(x), *args)
    259     i = 0
    260     for size, c in izip(cons_lengths, constraints):

File /opt/conda/lib/python3.10/site-packages/qiskit_machine_learning/algorithms/trainable_model.py:275, in TrainableModel._get_objective.<locals>.objective(objective_weights)
    274 def objective(objective_weights):
--> 275     objective_value = function.objective(objective_weights)
    276     self._callback(objective_weights, objective_value)
    277     return objective_value

File /opt/conda/lib/python3.10/site-packages/qiskit_machine_learning/algorithms/objective_functions.py:191, in OneHotObjectiveFunction.objective(self, weights)
    189 def objective(self, weights: np.ndarray) -> float:
    190     # probabilities is of shape (N, num_outputs)
--> 191     probs = self._neural_network_forward(weights)
    192     # float(...) is for mypy compliance
    193     value = float(np.sum(self._loss(probs, self._y)) / self._num_samples)

File /opt/conda/lib/python3.10/site-packages/qiskit_machine_learning/algorithms/objective_functions.py:102, in ObjectiveFunction._neural_network_forward(self, weights)
     97 # if we get the same weights, we don't compute the forward pass again.
     98 if self._last_forward_weights is None or (
     99     not np.all(np.isclose(weights, self._last_forward_weights))
    100 ):
    101     # compute forward and cache the results for re-use in backward
--> 102     self._last_forward = self._neural_network.forward(self._X, weights)
    103     # a copy avoids keeping a reference to the same array, so we are sure we have
    104     # different arrays on the next iteration.
    105     self._last_forward_weights = np.copy(weights)

File /opt/conda/lib/python3.10/site-packages/qiskit_machine_learning/neural_networks/neural_network.py:224, in NeuralNetwork.forward(self, input_data, weights)
    222 input_, shape = self._validate_input(input_data)
    223 weights_ = self._validate_weights(weights)
--> 224 output_data = self._forward(input_, weights_)
    225 return self._validate_forward_output(output_data, shape)

File /opt/conda/lib/python3.10/site-packages/qiskit_machine_learning/neural_networks/sampler_qnn.py:364, in SamplerQNN._forward(self, input_data, weights)
    362         results = job.result()
    363     except Exception as exc:
--> 364         raise QiskitMachineLearningError("Sampler job failed.") from exc
    365     result = self._postprocess(num_samples, results)
    366 else:

QiskitMachineLearningError: 'Sampler job failed.'

How can we reproduce the issue?

Simply calling fit on a QVC. The error happens randomly when a circuit evaluation fails

from qiskit_ibm_runtime import QiskitRuntimeService

service = QiskitRuntimeService(channel="ibm_quantum")
backend = service.backend("ibmq_qasm_simulator")

from qiskit_ibm_runtime import Sampler as Sampler_r

sampler_r = Sampler_r(session=backend)

vqc_r = VQC(
    sampler=sampler_r,
    feature_map=feature_map,
    ansatz=ansatz,
    optimizer=optimizer,
    callback=callback_graph,
)

# clear objective value history
objective_func_vals = []

start = time.time()
vqc_r.fit(train_features, train_labels)
elapsed = time.time() - start

print(f"Training time: {round(elapsed)} seconds")

What should happen?

In this scenario is preferred to print a warning, take "default values"(i.e. all 0) and continue the training.

Any suggestions?

Change this kind of behviour:

                try:
                    results = job.result()
                except Exception as exc:
                    raise QiskitMachineLearningError("Sampler job failed.") from exc

for something like this (idea, not actual code)

               try:
                    results = job.result()
                except (RuntimeJobFailureError) as e:
                    print('   ERROR, Recoverable',e)
                    results=[0]*s
                except:
                    print('   ERROR,  FATAL')
                    raise QiskitMachineLearningError("Sampler job failed.") from exc
woodsp-ibm commented 1 year ago

There is an open issue on primitives around retry Qiskit/qiskit-ibm-runtime#682

adekusar-drl commented 1 year ago

I'm closing the issue as it should be addressed in https://github.com/Qiskit/qiskit-ibm-runtime, particularly in the issue mentioned above.