Fix `pytorch` gradients

Simone-Bordoni commented 2 months ago

Fix #1449

Checklist:

[x] Reviewers confirm new code works as expected.
[x] Tests are passing.
[x] Coverage does not decrease.
[x] Documentation is updated.

BrunoLiegiBastonLiegi commented 2 months ago

I found a potential problem when you cast the parameters of parametrized gates: https://github.com/qiboteam/qibo/blob/1b6eb8a358f8f51d0f5a267303809724505cfb53/src/qibo/backends/pytorch.py#L32 here you cast to self.dtype which is usually torch.complex128 but it should actually be torch.float32/64 since this is a parameter. I don't know whether this is the origin of the problem with the gradients though.

EDIT: nevermind, you necessarily have to cast to complex otherwise when you build the matrix elements pytorch complains about mismatching dtypes...

Simone-Bordoni commented 2 months ago

I found a potential problem when you cast the parameters of parametrized gates:

https://github.com/qiboteam/qibo/blob/1b6eb8a358f8f51d0f5a267303809724505cfb53/src/qibo/backends/pytorch.py#L32

here you cast to self.dtype which is usually torch.complex128 but it should actually be torch.float32/64 since this is a parameter. I don't know whether this is the origin of the problem with the gradients though. EDIT: nevermind, you necessarily have to cast to complex otherwise when you build the matrix elements pytorch complains about mismatching dtypes...

Actually this may be part of the problem, i am rewriting in a better way the whole casting function differentiating the matrix_dtype from the parameters_dtype

Simone-Bordoni commented 1 month ago

Now the gradients are passing however their final value is always zero. I have tried visualizing the computational graph (shown below) and all the operations seems to be performed correctly. I am currently tsting a simple circuit with just one rotation gate (test_torch_gradients.py). In the next days I will not be able to work furthermore on this issue, I will resume working on the gradients next week. In the meantime if you find any possible reason for this problemlet me know, it would be very helpful. loss

Simone-Bordoni commented 1 month ago

Now the gradients are passing however their final value is always zero. I have tried visualizing the computational graph (shown below) and all the operations seems to be performed correctly. I am currently tsting a simple circuit with just one rotation gate (test_torch_gradients.py). In the next days I will not be able to work furthermore on this issue, I will resume working on the gradients next week. In the meantime if you find any possible reason for this problemlet me know, it would be very helpful.

Now the gradients are passing however their final value is always zero. I have tried visualizing the computational graph (shown below) and all the operations seems to be performed correctly. I am currently tsting a simple circuit with just one rotation gate (test_torch_gradients.py). In the next days I will not be able to work furthermore on this issue, I will resume working on the gradients next week. In the meantime if you find any possible reason for this problemlet me know, it would be very helpful.

I found out the problem was with the target state. now the gradients are passing correctly, as soon as possible I will clean up the code and it should be ready for a review.

Simone-Bordoni commented 1 month ago

I think that now everything has been fixed. The changes required were bigger than expected, I had to add the backend in the gate decompositions and in other parts. I have added a test to check the correct backpropagation in test_torch_gradients.py so that thay can be easily moved to qiboml.

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.93%. Comparing base (36ed3fe) to head (3f22d8d). Report is 51 commits behind head on master.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #1450 +/- ## ========================================== - Coverage 99.93% 99.93% -0.01% ========================================== Files 81 81 Lines 11820 11790 -30 ========================================== - Hits 11812 11782 -30 Misses 8 8 ``` | [Flag](https://app.codecov.io/gh/qiboteam/qibo/pull/1450/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=qiboteam) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/qiboteam/qibo/pull/1450/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=qiboteam) | `99.93% <100.00%> (-0.01%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=qiboteam#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

BrunoLiegiBastonLiegi commented 1 month ago

Thanks @Simone-Bordoni, #1449 is still failing even with this patch, but I think that's more related to a problem with the circuit.set/get_parameters().

Simone-Bordoni commented 1 month ago

Thanks @Simone-Bordoni, #1449 is still failing even with this patch, but I think that's more related to a problem with the circuit.set/get_parameters().

In the new version the backend.cast has requires_grad=False by default, you need to modify #1449 and activate the gradients when casting.

BrunoLiegiBastonLiegi commented 1 month ago

Like this?

self.circuit_parameters = torch.nn.Parameter(BACKEND.cast(circuit.get_parameters(), dtype=torch.float64, requires_grad=True))

because this fails anyway.

EDIT: sorry maybe I should have specified better, but it doesn't fail in the sense that the gradients don't pass, it actually crashes with an error:

RuntimeError: Output 0 of UnbindBackward0 is a view and its base or another view of its base has been modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

Simone-Bordoni commented 1 month ago

Like this?

self.circuit_parameters = torch.nn.Parameter(BACKEND.cast(circuit.get_parameters(), dtype=torch.float64, requires_grad=True))

because this fails anyway.

EDIT: sorry maybe I should have specified better, but it doesn't fail in the sense that the gradients don't pass, it actually crashes with an error:

RuntimeError: Output 0 of UnbindBackward0 is a view and its base or another view of its base has been modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

OK, I thought that the problem was the gradient not passing. Is it a different error with respect to what you were getting before this update?

BrunoLiegiBastonLiegi commented 1 month ago

Yes, if I remember correctly before it used to run without crashing but had the gradients flow problem.

Simone-Bordoni commented 1 month ago

Yes, if I remember correctly before it used to run without crashing but had the gradients flow problem.

Ok, anyway i think that it would be better to first merge this PR, as it is getting quite big, and then try to find why this specific case is not working correctly.

Simone-Bordoni commented 1 month ago

@alecandido

Simone-Bordoni commented 1 month ago

@alecandido I have talket to @renatomello regarding the error that arises when running the tests on github (locally they pass). The test "test_backends_torch_gradients" computes the gradient and an update step using both tensorflow and torch and checks that the result is the same. However tensorflow is enabled during the tests only when using ubuntu so the test fails on windows. Is there a way to perform the test only on ubuntu? or how can I solve the problem in another way?

BrunoLiegiBastonLiegi commented 1 month ago

I think you can use sys.platform to check under which system the test is running and skip:

@pytest.mark.skipif(sys.platform == "win32", reason="Tensorflow not available under windows.")
def test_...

Simone-Bordoni commented 1 month ago

Thanks @Simone-Bordoni, looks good. The only two things that I am not a fan of are:

The new requires_grad argument added to cast. I would remove that and, when possible, move the functions, where it is needed, inside the backend as methods (like the _curve_fit case discussed above) . For the cases when it is not possible I would create specific backend.cast_parameter()
# numpy
def cast_parameter(self, x, **kwargs):
    return self.cast(x, **kwargs)
#  torch
def cast_parameter(self, x, **kwargs):
    return self.cast(x, **kwargs).requires_grad_()
The fact that the decomposition needs the backend now. Ideally, I don't think this should be the case, but I don't know in practice if it's possible to avoid.

Regarding the decomposition, as it is now you need the backend. Regarding the cast_parameter, I can add it to the other backends, however the corrections you are suggesting is not possible as the cast_parameter differs a lot from the cast function:

cast parameter tries to convert integer to float in torch
the default dtype for cast parameter is torch float64 (self.parameter_dtype), opposite to cast that has complex 128 (self.dtype) that is used on matrices.
cast makes different operations on torch trying to make it the same as tensorflow, for example regarding lists of arrays.

So I agree on adding a cast parameter function to all backends but this should not be related to the standard cast()

Simone-Bordoni commented 1 month ago

Thanks @Simone-Bordoni, looks good. The only two things that I am not a fan of are:

The new requires_grad argument added to cast. I would remove that and, when possible, move the functions, where it is needed, inside the backend as methods (like the _curve_fit case discussed above) . For the cases when it is not possible I would create specific backend.cast_parameter()
# numpy
def cast_parameter(self, x, **kwargs):
    return self.cast(x, **kwargs)
#  torch
def cast_parameter(self, x, **kwargs):
    return self.cast(x, **kwargs).requires_grad_()
The fact that the decomposition needs the backend now. Ideally, I don't think this should be the case, but I don't know in practice if it's possible to avoid.

Regarding the decomposition, as it is now you need the backend. Regarding the cast_parameter, I can add it to the other backends, however the corrections you are suggesting is not possible as the cast_parameter differs a lot from the cast function:

cast parameter tries to convert integer to float in torch
the default dtype for cast parameter is torch float64 (self.parameter_dtype), opposite to cast that has complex 128 (self.dtype) that is used on matrices.
cast makes different operations on torch trying to make it the same as tensorflow, for example regarding lists of arrays.

So I agree on adding a cast parameter function to all backends but this should not be related to the standard cast()

Simone-Bordoni commented 1 month ago

Regarding the decomposition, as it is now you need the backend. Regarding the cast_parameter, I can add it to the other backends, however the corrections you are suggesting is not possible as the cast_parameter differs a lot from the cast function:

cast parameter tries to convert integer to float in torch

the default dtype for cast parameter is torch float64 (self.parameter_dtype), opposite to cast that has complex 128 (self.dtype) that is used on matrices.

cast makes different operations on torch trying to make it the same as tensorflow, for example regarding lists of arrays.

So I agree on adding a cast parameter function to all backends but this should not be related to the standard cast()

Actually cast_parameter() was made to cast a single parameter for matrix_parametrized(). An idea could be making it a helper method in the torch backend and creating a new public mthod like cast_parameters() to be used in these situations. Anyway I don't know if @renatomello agrees on adding this new function to the numpy backend to avoid different backend behaviours.

qiboteam / qibo

Fix `pytorch` gradients #1450

Codecov Report