Model Enhancement for Hybrid Quantum-classical Neural Networks in the Qiskit Textbook

cyx617 commented 4 years ago

What is the expected enhancement?

The chapter 4.1.5 of the Qiskit textbook (https://qiskit.org/textbook/ch-machine-learning/machine-learning-qiskit-pytorch.html) shows a very intriguing example of hybrid quantum-classical neural networks with PyTorch and Qiskit and it demonstrates the ability of integrating Qiskit and PyTorch in the field of quantum machine learning. But some details of the code might need to be reconsidered. For example, when creating the hybrid neural network, we use the code below

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=5)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=5)
        self.dropout = nn.Dropout2d()
        self.fc1 = nn.Linear(256, 64)
        self.fc2 = nn.Linear(64, 1)
        self.hybrid = Hybrid(qiskit.Aer.get_backend('qasm_simulator'), 100, np.pi / 2)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2)
        x = self.dropout(x)
        x = x.view(-1, 256)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        x = self.hybrid(x)
        return torch.cat((x, 1 - x), -1)

If you print out the shape of the torch tensor in each layer of the neural network, you would see that after the code x = x.view(-1, 256) the dimensions of the input tensors for fc1 and fc2 layer would be (25,256) and (25,64) respectively. In PyTorch, the input for the fc layer (i.e. linear layer) should be represented as the tensor [batch_size, in_features] (as describe in https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear). So the input tensor (25,256) for fc1 layer means we have a batch size of 25, but we actually set the batch size as 1 in both train_loader and test_loader. This is a bit confusing. Furthermore, as the fc layer in PyTorch applies the linear transformation only onto the last dimension (i.e. in_features) of the input tensor, the output of the fc2 layer would be of size (25,1). This tensor is then fed into Hybrid module and finally goes to the HybridFunction which defines the forward pass computation and backward pass computation.

class HybridFunction(Function):
    """ Hybrid quantum - classical function definition """

    @staticmethod
    def forward(ctx, input, quantum_circuit, shift):
        """ Forward pass computation """
        ctx.shift = shift
        ctx.quantum_circuit = quantum_circuit

        expectation_z = ctx.quantum_circuit.run(input[0].tolist())
        result = torch.tensor([expectation_z])
        ctx.save_for_backward(input, result)

        return result

    @staticmethod
    def backward(ctx, grad_output):
        """ Backward pass computation """
        input, expectation_z = ctx.saved_tensors
        input_list = np.array(input.tolist())

        shift_right = input_list + np.ones(input_list.shape) * ctx.shift
        shift_left = input_list - np.ones(input_list.shape) * ctx.shift

        gradients = []
        for i in range(len(input_list)):
            expectation_right = ctx.quantum_circuit.run(shift_right[i])
            expectation_left  = ctx.quantum_circuit.run(shift_left[i])

            gradient = torch.tensor([expectation_right]) - torch.tensor([expectation_left])
            gradients.append(gradient)
        gradients = np.array([gradients]).T
        return torch.tensor([gradients]).float() * grad_output.float(), None, None

class Hybrid(nn.Module):
    """ Hybrid quantum - classical layer definition """

    def __init__(self, backend, shots, shift):
        super(Hybrid, self).__init__()
        self.quantum_circuit = QuantumCircuit(1, backend, shots)
        self.shift = shift

    def forward(self, input):
        return HybridFunction.apply(input, self.quantum_circuit, self.shift)

There seems to be an inconsistency of number of parameters fed into the quantum circuit as only the first element of the input tensor (25,1) is used for the forward pass computation while all of 25 elements are used to calculate the gradient during the backward pass. This might explain the long training time (model training took around 245s per epoch when I run the model on Qiskit text book page) and the fluctuating loss curve of our model as shown in the Qiskit text book. The problem is due to the incorrect dimension of the input tensor for fc1 layer and it can be fixed by replacing the code x = x.view(-1, 256) by x = x.view(1,-1) where -1 would infer the last dimension as 25 256. However, considering the resulting large weight matrix (i.e. 64 by 25256 matrix), this would add much more parameters to the model and hence increase the risk of overfitting as well as longer training time. So I actually modified the convolutional layers so that we can have a model of moderate size. Here is the code of model architecture

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
        self.dropout = nn.Dropout2d()
        self.fc1 = nn.Linear(256, 64)
        self.fc2 = nn.Linear(64, 1)
        self.hybrid = Hybrid(qiskit.Aer.get_backend('qasm_simulator'), 100, np.pi / 2)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2)
        x = self.dropout(x)
        x = x.view(1, -1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        x = self.hybrid(x)
        return torch.cat((x, 1 - x), -1)

where I reduce the number of filters/kernels of each convolutional layer to ensure an input tensor of moderate size for the fc1 layer. I did not change other parts of the code in the example. I run both the current and new model in the jupyter notebook from IBM Quantum Experience. It turned out that the new model achieved a better performance in terms of both train/test loss and running time, as shown in the following table

Model	Train Loss	Test Loss	Training Time
Current	-0.8937	-0.8734	1480s
New	-0.9871	-0.9794	145s

In addition, the training process of the new model is more stable, as shown in the training loss curve below

which might help build confidence for readers who would like to work in the field of quantum machine learning.

The current model in the example can actually runs successfully, but some details might confuse some readers, especially those who don't have too much experience in machine learning or deep learning. So I am not sure if it is necessary to update the code in chapter 4.1.5 of the Qiskit text book. But I hope my report could help with the documentation enhancement.

woodsp-ibm commented 4 years ago

Thank you for the detailed thought here. Since this is against the qiskit textbook the issue should ideally be against the repository https://github.com/qiskit-community/qiskit-textbook to get it to the authors/creators of that textbook and that specific section. I will see if I can get this transferred there.

cyx617 commented 4 years ago

@woodsp-ibm Thanks for your help! I'm very sorry for not opening this issue against the correct repository. If you find it difficult transferring it, I could also open the same issue against the repository for the qiskit textbook.

woodsp-ibm commented 4 years ago

No worries! I do not think its easy to find any information in terms of how to report any issues you might find in the text book. I just have to find someone with enough permission to move this as I do not have the required rights on the text book repository to do so myself.

cyx617 commented 4 years ago

@woodsp-ibm Got it. Thanks for that!:smiley:

woodsp-ibm commented 4 years ago

@cyx617 It seems, after finding someone who has enough permission, that Github does not allow issues to be transferred between repositories in different organizations i.e. from Qiskit here to Qiskit-Community which holds the textbook repository.

Therefore might I ask you to raise a new issue there - hopefully its quick to copy/paste the text you have done - and then close this one off once you have done that. Thanks.

cyx617 commented 4 years ago

@woodsp-ibm Thanks for letting me know. I'll open a new issue in Qiskit-Community and then close this one.

qiskit-community / qiskit-aqua

Model Enhancement for Hybrid Quantum-classical Neural Networks in the Qiskit Textbook #1180

What is the expected enhancement?