Closed cyx617 closed 4 years ago
Thank you for the detailed thought here. Since this is against the qiskit textbook the issue should ideally be against the repository https://github.com/qiskit-community/qiskit-textbook to get it to the authors/creators of that textbook and that specific section. I will see if I can get this transferred there.
@woodsp-ibm Thanks for your help! I'm very sorry for not opening this issue against the correct repository. If you find it difficult transferring it, I could also open the same issue against the repository for the qiskit textbook.
No worries! I do not think its easy to find any information in terms of how to report any issues you might find in the text book. I just have to find someone with enough permission to move this as I do not have the required rights on the text book repository to do so myself.
@woodsp-ibm Got it. Thanks for that!:smiley:
@cyx617 It seems, after finding someone who has enough permission, that Github does not allow issues to be transferred between repositories in different organizations i.e. from Qiskit here to Qiskit-Community which holds the textbook repository.
Therefore might I ask you to raise a new issue there - hopefully its quick to copy/paste the text you have done - and then close this one off once you have done that. Thanks.
@woodsp-ibm Thanks for letting me know. I'll open a new issue in Qiskit-Community and then close this one.
What is the expected enhancement?
The chapter 4.1.5 of the Qiskit textbook (https://qiskit.org/textbook/ch-machine-learning/machine-learning-qiskit-pytorch.html) shows a very intriguing example of hybrid quantum-classical neural networks with PyTorch and Qiskit and it demonstrates the ability of integrating Qiskit and PyTorch in the field of quantum machine learning. But some details of the code might need to be reconsidered. For example, when creating the hybrid neural network, we use the code below
If you print out the shape of the torch tensor in each layer of the neural network, you would see that after the code
x = x.view(-1, 256)
the dimensions of the input tensors for fc1 and fc2 layer would be(25,256)
and(25,64)
respectively. In PyTorch, the input for the fc layer (i.e. linear layer) should be represented as the tensor[batch_size, in_features]
(as describe in https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear). So the input tensor(25,256)
for fc1 layer means we have a batch size of 25, but we actually set the batch size as 1 in bothtrain_loader
andtest_loader
. This is a bit confusing. Furthermore, as the fc layer in PyTorch applies the linear transformation only onto the last dimension (i.e.in_features
) of the input tensor, the output of the fc2 layer would be of size(25,1)
. This tensor is then fed intoHybrid
module and finally goes to theHybridFunction
which defines the forward pass computation and backward pass computation.There seems to be an inconsistency of number of parameters fed into the quantum circuit as only the first element of the input tensor
(25,1)
is used for the forward pass computation while all of 25 elements are used to calculate the gradient during the backward pass. This might explain the long training time (model training took around 245s per epoch when I run the model on Qiskit text book page) and the fluctuating loss curve of our model as shown in the Qiskit text book. The problem is due to the incorrect dimension of the input tensor for fc1 layer and it can be fixed by replacing the codex = x.view(-1, 256)
byx = x.view(1,-1)
where -1 would infer the last dimension as 25 256. However, considering the resulting large weight matrix (i.e. 64 by 25256 matrix), this would add much more parameters to the model and hence increase the risk of overfitting as well as longer training time. So I actually modified the convolutional layers so that we can have a model of moderate size. Here is the code of model architecturewhere I reduce the number of filters/kernels of each convolutional layer to ensure an input tensor of moderate size for the fc1 layer. I did not change other parts of the code in the example. I run both the current and new model in the jupyter notebook from IBM Quantum Experience. It turned out that the new model achieved a better performance in terms of both train/test loss and running time, as shown in the following table
In addition, the training process of the new model is more stable, as shown in the training loss curve below
which might help build confidence for readers who would like to work in the field of quantum machine learning.
The current model in the example can actually runs successfully, but some details might confuse some readers, especially those who don't have too much experience in machine learning or deep learning. So I am not sure if it is necessary to update the code in chapter 4.1.5 of the Qiskit text book. But I hope my report could help with the documentation enhancement.