pytorch / ignite

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.
https://pytorch-ignite.ai
BSD 3-Clause "New" or "Revised" License
4.51k stars 612 forks source link

Move contrib metrics files #3220

Closed leej3 closed 6 months ago

leej3 commented 6 months ago

Same migration pattern as #3204:

references are updated as part of this PR to avoid failures in building the docs

vfdev-5 commented 6 months ago

@leej3 can you check why https://github.com/pytorch/ignite/actions/runs/8451157627/job/23149407599?pr=3220 is failing?

leej3 commented 6 months ago

@leej3 can you check why https://github.com/pytorch/ignite/actions/runs/8451157627/job/23149407599?pr=3220 is failing?

looking through it now...

leej3 commented 6 months ago

I refactored the commits to make it easier to spot the modifications to the files that were in ignite.contrib.metrics and are now in ignite.metrics.

I can't spot what might be causing the failure of the TPU tests though. The diff is small but persists across reruns:

    def test_distrib_single_device_xla():
        device = idist.device()
>       _test_distrib_compute(device)

tests/ignite/metrics/regression/test_mean_error.py:233: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/ignite/metrics/regression/test_mean_error.py:127: in _test_distrib_compute
    _test("cpu")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

metric_device = device(type='cpu')

    def _test(metric_device):
        metric_device = torch.device(metric_device)
        m = MeanError(device=metric_device)

        y_pred = torch.rand(size=(100,), device=device)
        y = torch.rand(size=(100,), device=device)

        m.update((y_pred, y))

        y_pred = idist.all_gather(y_pred)
        y = idist.all_gather(y)

        np_y = y.cpu().numpy()
        np_y_pred = y_pred.cpu().numpy()

        np_sum = (np_y - np_y_pred).sum()
        np_len = len(np_y_pred)
        np_ans = np_sum / np_len

>       assert m.compute() == pytest.approx(np_ans)
E       assert 0.003967249393463134 == 0.003967256546020508 ± 4.0e-09
E         
E         comparison failed
E         Obtained: 0.003967249393463134
E         Expected: 0.003967256546020508 ± 4.0e-09
leej3 commented 6 months ago

I didn't discover anything especially useful from ssh-ing into the tpu tests machine. Some of the tests in ignite/metrics/regression fail intermittently on master and this branch. The failures are deterministic but failures are triggered by things like the order the tests are run. For example running pytest regression/test_mean_error.py reliably passes on both branches but pytest regression reliably fails on both branchs.

I have adjusted the tolerance of the comparisons to make it pass more reliably.