Quantum kernel - missing indices in regression tasks

Environment

Qiskit Machine Learning version: 0.4.0
Python version: 3.9.7 64-bit
Operating system: Windows 10 Enterprise Version 21H2

What is happening?

The bug arises in the Qiskit QuantumKernel class (more precisely in the evaluation routine) when performing a regression task with the quantum kernel (SVR,...) and not using the statevector simulator. It just happens if a non-symmetric kernel matrix is computed. When some training- and testing points of the regression happen to be the same point the bug prevents the inclusion of the entries in the list of the "to_be_computed_data_pair"'s - which results in a wrong kernel entry of 0., and subsequently leading to the wrong regression result - since the kernel entry of two identical points should be 1.

How can we reproduce the issue?

Create a simple test function for the regression, where a kernel evaluated on a quantum computer is used.

# General Imports
import numpy as np

# Scikit Imports
from sklearn.preprocessing import StandardScaler

# Qiskit Imports
from qiskit import Aer
from qiskit.circuit import QuantumCircuit, ParameterVector
from qiskit_machine_learning.kernels import QuantumKernel

# backends
statevec = Aer.get_backend('statevector_simulator')
qasm = Aer.get_backend('qasm_simulator')

# Function to regress: in this example x*sin(x)
X = np.linspace(start=0, stop=10, num=1_00).reshape(-1, 1)
y = np.squeeze( X*np.sin(X) )

rng = np.random.RandomState(10)
training_indices = rng.choice(np.arange(y.size), size=20, replace=False)
training_indices = np.sort(training_indices)
X_train, y_train = X[training_indices], y[training_indices]
print(training_indices)
# in this case 20 of our training points gonna be similar to testing points
# Scaling of input feature
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
y_scaled = scaler.fit_transform(y.reshape(-1, 1))
X_train = X_scaled[training_indices]
y_train = y_scaled[training_indices]
# artificial feature expansion
# stack the training and test points to the number of qubits used in the encoding
XX = np.column_stack((X_scaled,X_scaled))
X4_test = np.column_stack((XX,XX))
XX_train = np.column_stack((X_train,X_train))
X4_train = np.column_stack((XX_train,XX_train))

# define data encoding (a custom one is used in this case):
#custom parameterized quantum circuit
def PQC(qubits,layers,c):

    x = ParameterVector('x', length=qubits)
    theta = ParameterVector('θ',length=2*qubits)

    theta_range = np.linspace(start=0, stop=2*np.pi, num=2*qubits)
    rand_idx = np.random.choice(np.arange(theta_range.size), size=2*qubits, replace=False)
    theta_sample = theta_range[rand_idx]
    var_custom = QuantumCircuit(qubits)
    counter = 0

    for j in range(layers):
        for i in range(qubits):
            if i != 0:
                counter += 1
            var_custom.ry(c*x[i] + theta[i+counter],i)
            var_custom.rz(c*x[i] + theta[i+1+counter],i)
            if i == qubits-1:
                counter = 0

        if (j % 2) == 0:
            for i in range(qubits-1):
                if(i % 2) == 0:
                    var_custom.cx(i, i+1)
        else:
            for i in range(qubits-1):
                if(i % 2) == 1:
                    var_custom.cx(i, i+1)
    var_custom = var_custom.bind_parameters({theta: theta_sample})
    return var_custom

# use a feature map with 4 qubits to built up the quantum kernel (build two kernels - one for the statevector one with the qasm to compare the kernel entries and show the error):
pqc_map = PQC(qubits=4,layers=2,c=1.0)
quantum_kernel_statevec = QuantumKernel(feature_map=pqc_map, quantum_instance=statevec)
quantum_kernel_qasm = QuantumKernel(feature_map=pqc_map, quantum_instance=qasm)

# evaluate the non-symmetric test-training kernel matrix for both backends and compare the entries to see the error with the 0. entries (where there should be a 1.) in the kernel evaluated with the qasm simulator:
K_test_train_statevec = quantum_kernel_statevec.evaluate(x_vec=X4_test, y_vec=X4_train)
K_test_train_qasm = quantum_kernel_qasm.evaluate(x_vec=X4_test, y_vec=X4_train)

print(K_test_train_statevec)
print(K_test_train_qasm)

What should happen?

In the Qiskit QuantumKernel class there is a line in the code which prevents the inclusion of the data pair's where the data points have the same value (for noin-symmetric kernel matrices). These have to be 1. (or with slightly tolerance around 1. if a noisy backend is used). The following code is copied out of the QuantumKernel class (evaluate function) with a comment over the line where the bug lays (in my opinion):

else:  # not using state vector simulator
            feature_map_params_x = ParameterVector("par_x", self._feature_map.num_parameters)
            feature_map_params_y = ParameterVector("par_y", self._feature_map.num_parameters)
            parameterized_circuit = self.construct_circuit(
                feature_map_params_x,
                feature_map_params_y,
                measurement=measurement,
                is_statevector_sim=is_statevector_sim,
            )
            parameterized_circuit = self._quantum_instance.transpile(
                parameterized_circuit, pass_manager=self._quantum_instance.unbound_pass_manager
            )[0]

            for idx in range(0, len(mus), self._batch_size):
                to_be_computed_data_pair = []
                to_be_computed_index = []
                for sub_idx in range(idx, min(idx + self._batch_size, len(mus))):
                    i = mus[sub_idx]
                    j = nus[sub_idx]
                    x_i = x_vec[i]
                    y_j = y_vec[j]
                    #########  here occurs the little bug for non-symmetric matrices  #####
                    if not np.all(x_i == y_j):
                        to_be_computed_data_pair.append((x_i, y_j))
                        to_be_computed_index.append((i, j))

                circuits = [
                    parameterized_circuit.assign_parameters(
                        {feature_map_params_x: x, feature_map_params_y: y}
                    )
                    for x, y in to_be_computed_data_pair
                ]]

Any suggestions?

Simply comment out the line if not np.all(x_i == y_j) in the evaluate function attached above. Then all indices of non-symmetric matrices are included in the inner product computation and the regression method of choice is going to produce valid results with a quantum kernel.

qiskit-community / qiskit-machine-learning