Closed stfnmangini closed 3 years ago
I wouldn't mind investigating this issue if it's still up for grabs
Hi @jonvet, that would be lit. Though, I don't know if someone from the Qiskit team already took this issue in charge 🤷
I supposed to have a look at this issue this week, but I'm perfectly fine if anybody else can work on it.
@jonvet I noticed the PR, if it fixes the scenario posted in this issue, then likely it is good to go. I'll get back to the PR today/tomorrow. Thanks for taking care of the issue!
Information
What is the current behavior?
If a
CircuitQNN
is used with aTorchConnector
, the result PyTorch model have some issues calculating the gradients of the parameters in the circuit, when the model is evaluated on a batch of data, and not a single sample.Steps to reproduce the problem
Here is an example to reproduce the problem. I use the same circuitry defined in the tutorial (https://github.com/Qiskit/qiskit-machine-learning/blob/master/docs/tutorials/05_torch_connector.ipynb), using
CircuitQNN
and theTorchConnector
to create a quantum neural network. I try to evaluate the gradients of the parameters on a regression task with a trivial dataset, consisting of 20 identical inputs and corresponding targets.As a loss function, I consider the
MSELoss
withreduction=sum
, and I try to evaluate the loss and its gradients in different ways:MSELoss
on the full dataset (consisting of 20 identical item)(output-target).pow(2).sum()
again on the whole datasetThen, the gradients are evaluated using the
loss.backward()
and extracted withmodel.weights.grad
. Note that there is no optimizer step! I only evaluated the gradients without updating the weights. All these methods should be fully equivalent, since the data are always the same, and there is no seed, ordering, or strange twists. Note that while Methods 1, 2 and 3 use the full dataset of 20 samples, Method 4 uses only a single item, so its gradient is expected to be 20 times smaller (since we are usingMSELoss(reduction="sum")
).What is the expected behavior?
The gradients should be all equals. In particular, evaluating the loss using a batch of data (Methods 1 and 2) yields vanishing gradients. Note that if one substitutes the quantum model created through Qiskit, with a simple
model2 = torch.nn.Linear(2,2)
then the gradients are correctly equals, so the problem is somewhere in the Qiskit's Machine Learning module. (to run the code above with the classical linea layer, also substitutemodel2.weights.grad
withmodel2.weight.grad
).Suggested solutions