pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.06k stars 6.93k forks source link

Replace torch.lstsq with torch.linalg.lstsq in `perspective` transform #3843

Closed fmassa closed 3 years ago

fmassa commented 3 years ago

🚀 Feature

We recently started getting the following types of warnings:

/root/project/torchvision/transforms/functional.py:597: UserWarning: torch.lstsq is deprecated in favor of torch.linalg.lstsq and will be removed in a future PyTorch release.
torch.linalg.lstsq has reversed arguments and does not return the QR decomposition in the returned tuple (although it returns other information about the problem).
To get the qr decomposition consider using torch.linalg.qr.
The returned solution in torch.lstsq stored the residuals of the solution in the last m - n columns of the returned value whenever m > n. In torch.linalg.lstsq, the residuals in the field 'residuals' of the returned named tuple.
The unpacking of the solution, as in
X, _ = torch.lstsq(B, A).solution[:A.size(1)]
should be replaced with
X = torch.linalg.lstsq(A, B).solution (Triggered internally at  /opt/conda/conda-bld/pytorch_1620975976061/work/aten/src/ATen/LegacyTHFunctionsCPU.cpp:389.)
  res = torch.lstsq(b_matrix, a_matrix)[0]

which gets even more visible in the newly added galleries

We should replace https://github.com/pytorch/vision/blob/b56f17ae1ae8a5d08067c7f7444af21fb3b59ca6/torchvision/transforms/functional.py#L597 with

res = torch.linalg.lstsq(a_matrix, b_matrix).solution

So that we stop using a deprecated function.

One caveat is that maybe some tests might start failing (due to its inherent flakiness), and we should fix the tests that break on the way

It would be good to fix this before the next release, so that users don't get those warnings.

cc @vfdev-5

datumbox commented 3 years ago

It seems that the two methods have differences:

torch.linalg.lstsq(a_matrix, b_matrix).solution-torch.lstsq(b_matrix, a_matrix)[0].squeeze(1)
Out[1]: 
tensor([ 0.0000e+00, -4.8429e-08,  1.6689e-06, -1.4529e-07,  2.3842e-07,
         2.3842e-07, -3.9581e-09, -3.4925e-10])

This causes failures to test test_perspective_batch[None-dims_and_points0-cpu]. See #3918 for details:

AssertionError: Tensors are not equal!

Mismatched elements: 3 / 2652 (0.1%)
Greatest absolute difference: 183 at (0, 2, 17)
Greatest relative difference: 1.0 at (0, 2, 17)
device = 'cpu'
dims_and_points = ((26, 34), [[[0, 0], [33, 0], [33, 25], [0, 25]], [[3, 2], [32, 3], [30, 24], [2, 25]]])
dt = None, tester = <test_functional_tensor.Tester testMethod=runTest>

    @pytest.mark.parametrize('device', cpu_and_gpu())
    @pytest.mark.parametrize('dims_and_points', _get_data_dims_and_points_for_perspective())
    @pytest.mark.parametrize('dt', [None, torch.float32, torch.float64, torch.float16])
    def test_perspective_batch(device, dims_and_points, dt, tester):

        if dt == torch.float16 and device == "cpu":
            # skip float16 on CPU case
            return

        data_dims, (spoints, epoints) = dims_and_points

        batch_tensors = tester._create_data_batch(*data_dims, num_samples=4, device=device)
        if dt is not None:
            batch_tensors = batch_tensors.to(dtype=dt)

        # Ignore the equivalence between scripted and regular function on float16 cuda. The pixels at
        # the border may be entirely different due to small rounding errors.
        scripted_fn_atol = -1 if (dt == torch.float16 and device == "cuda") else 1e-8
        tester._test_fn_on_batch(
            batch_tensors, F.perspective, scripted_fn_atol=scripted_fn_atol,
>           startpoints=spoints, endpoints=epoints, interpolation=NEAREST
        )

test/test_functional_tensor.py:786: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test/test_functional_tensor.py:43: in _test_fn_on_batch
    assert_equal(transformed_img, transformed_batch[i, ...])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

actual = tensor([[[  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,... 0],
         [  0,   0, 195,  ...,   0,   0,   0],
         [  0,   0, 226,  ...,   0,   0,   0]]], dtype=torch.uint8)
expected = tensor([[[  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,... 0],
         [  0,   0, 195,  ...,   0,   0,   0],
         [  0,   0, 226,  ...,   0,   0,   0]]], dtype=torch.uint8)

The difference appears on:

(transformed_img.float()-transformed_batch[i, ...]).abs().max()
Out[2]: tensor(107.)

transformed_img[0][13][2], transformed_batch[i][0][13][2]
Out[3]: (tensor(0, dtype=torch.uint8), tensor(107, dtype=torch.uint8))
fmassa commented 3 years ago

Yes, I've faced those issues when I tried it locally, and it's why I mentioned that

One caveat is that maybe some tests might start failing (due to its inherent flakiness), and we should fix the tests that break on the way

This to me is flakiness in the test, so I would be fine changing it so that tests pass again

NicolasHug commented 3 years ago

Just for ref the failures weren't due the test being flaky, nor to a difference between torch.linalg.lstsq and torch.lstsq: the failures were due to the instability of torch.linalg.lstsq instead. Details in https://github.com/pytorch/vision/pull/3918#issuecomment-863930409 :)

fmassa commented 3 years ago

@NicolasHug thanks for investigating this @NicolasHug !