Closed Simone-Bordoni closed 1 month ago
I found a potential problem when you cast the parameters of parametrized gates:
https://github.com/qiboteam/qibo/blob/1b6eb8a358f8f51d0f5a267303809724505cfb53/src/qibo/backends/pytorch.py#L32
here you cast to self.dtype
which is usually torch.complex128
but it should actually be torch.float32/64
since this is a parameter. I don't know whether this is the origin of the problem with the gradients though.
EDIT: nevermind, you necessarily have to cast to complex otherwise when you build the matrix elements pytorch complains about mismatching dtypes...
I found a potential problem when you cast the parameters of parametrized gates:
here you cast to
self.dtype
which is usuallytorch.complex128
but it should actually betorch.float32/64
since this is a parameter. I don't know whether this is the origin of the problem with the gradients though. EDIT: nevermind, you necessarily have to cast to complex otherwise when you build the matrix elements pytorch complains about mismatching dtypes...
Actually this may be part of the problem, i am rewriting in a better way the whole casting function differentiating the matrix_dtype from the parameters_dtype
Now the gradients are passing however their final value is always zero. I have tried visualizing the computational graph (shown below) and all the operations seems to be performed correctly. I am currently tsting a simple circuit with just one rotation gate (test_torch_gradients.py). In the next days I will not be able to work furthermore on this issue, I will resume working on the gradients next week. In the meantime if you find any possible reason for this problemlet me know, it would be very helpful.
Now the gradients are passing however their final value is always zero. I have tried visualizing the computational graph (shown below) and all the operations seems to be performed correctly. I am currently tsting a simple circuit with just one rotation gate (test_torch_gradients.py). In the next days I will not be able to work furthermore on this issue, I will resume working on the gradients next week. In the meantime if you find any possible reason for this problemlet me know, it would be very helpful.
Now the gradients are passing however their final value is always zero. I have tried visualizing the computational graph (shown below) and all the operations seems to be performed correctly. I am currently tsting a simple circuit with just one rotation gate (test_torch_gradients.py). In the next days I will not be able to work furthermore on this issue, I will resume working on the gradients next week. In the meantime if you find any possible reason for this problemlet me know, it would be very helpful.
I found out the problem was with the target state. now the gradients are passing correctly, as soon as possible I will clean up the code and it should be ready for a review.
I think that now everything has been fixed. The changes required were bigger than expected, I had to add the backend in the gate decompositions and in other parts. I have added a test to check the correct backpropagation in test_torch_gradients.py so that thay can be easily moved to qiboml.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 99.93%. Comparing base (
36ed3fe
) to head (3f22d8d
). Report is 51 commits behind head on master.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks @Simone-Bordoni, #1449 is still failing even with this patch, but I think that's more related to a problem with the circuit.set/get_parameters()
.
Thanks @Simone-Bordoni, #1449 is still failing even with this patch, but I think that's more related to a problem with the
circuit.set/get_parameters()
.
In the new version the backend.cast has requires_grad=False by default, you need to modify #1449 and activate the gradients when casting.
Like this?
self.circuit_parameters = torch.nn.Parameter(BACKEND.cast(circuit.get_parameters(), dtype=torch.float64, requires_grad=True))
because this fails anyway.
EDIT: sorry maybe I should have specified better, but it doesn't fail in the sense that the gradients don't pass, it actually crashes with an error:
RuntimeError: Output 0 of UnbindBackward0 is a view and its base or another view of its base has been modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.
Like this?
self.circuit_parameters = torch.nn.Parameter(BACKEND.cast(circuit.get_parameters(), dtype=torch.float64, requires_grad=True))
because this fails anyway.
EDIT: sorry maybe I should have specified better, but it doesn't fail in the sense that the gradients don't pass, it actually crashes with an error:
RuntimeError: Output 0 of UnbindBackward0 is a view and its base or another view of its base has been modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.
OK, I thought that the problem was the gradient not passing. Is it a different error with respect to what you were getting before this update?
Yes, if I remember correctly before it used to run without crashing but had the gradients flow problem.
Yes, if I remember correctly before it used to run without crashing but had the gradients flow problem.
Ok, anyway i think that it would be better to first merge this PR, as it is getting quite big, and then try to find why this specific case is not working correctly.
@alecandido
@alecandido I have talket to @renatomello regarding the error that arises when running the tests on github (locally they pass). The test "test_backends_torch_gradients" computes the gradient and an update step using both tensorflow and torch and checks that the result is the same. However tensorflow is enabled during the tests only when using ubuntu so the test fails on windows. Is there a way to perform the test only on ubuntu? or how can I solve the problem in another way?
I think you can use sys.platform
to check under which system the test is running and skip:
@pytest.mark.skipif(sys.platform == "win32", reason="Tensorflow not available under windows.")
def test_...
Thanks @Simone-Bordoni, looks good. The only two things that I am not a fan of are:
- The new
requires_grad
argument added tocast
. I would remove that and, when possible, move the functions, where it is needed, inside the backend as methods (like the_curve_fit
case discussed above) . For the cases when it is not possible I would create specificbackend.cast_parameter()
# numpy def cast_parameter(self, x, **kwargs): return self.cast(x, **kwargs) # torch def cast_parameter(self, x, **kwargs): return self.cast(x, **kwargs).requires_grad_()
- The fact that the decomposition needs the backend now. Ideally, I don't think this should be the case, but I don't know in practice if it's possible to avoid.
Regarding the decomposition, as it is now you need the backend. Regarding the cast_parameter, I can add it to the other backends, however the corrections you are suggesting is not possible as the cast_parameter differs a lot from the cast function:
So I agree on adding a cast parameter function to all backends but this should not be related to the standard cast()
Thanks @Simone-Bordoni, looks good. The only two things that I am not a fan of are:
- The new
requires_grad
argument added tocast
. I would remove that and, when possible, move the functions, where it is needed, inside the backend as methods (like the_curve_fit
case discussed above) . For the cases when it is not possible I would create specificbackend.cast_parameter()
# numpy def cast_parameter(self, x, **kwargs): return self.cast(x, **kwargs) # torch def cast_parameter(self, x, **kwargs): return self.cast(x, **kwargs).requires_grad_()
- The fact that the decomposition needs the backend now. Ideally, I don't think this should be the case, but I don't know in practice if it's possible to avoid.
Regarding the decomposition, as it is now you need the backend. Regarding the cast_parameter, I can add it to the other backends, however the corrections you are suggesting is not possible as the cast_parameter differs a lot from the cast function:
So I agree on adding a cast parameter function to all backends but this should not be related to the standard cast()
Regarding the decomposition, as it is now you need the backend. Regarding the cast_parameter, I can add it to the other backends, however the corrections you are suggesting is not possible as the cast_parameter differs a lot from the cast function:
- cast parameter tries to convert integer to float in torch
- the default dtype for cast parameter is torch float64 (self.parameter_dtype), opposite to cast that has complex 128 (self.dtype) that is used on matrices.
- cast makes different operations on torch trying to make it the same as tensorflow, for example regarding lists of arrays.
So I agree on adding a cast parameter function to all backends but this should not be related to the standard cast()
Actually cast_parameter() was made to cast a single parameter for matrix_parametrized(). An idea could be making it a helper method in the torch backend and creating a new public mthod like cast_parameters() to be used in these situations. Anyway I don't know if @renatomello agrees on adding this new function to the numpy backend to avoid different backend behaviours.
Fix #1449
Checklist: