Tests fail on macbook with incomplete MPS support

mehta-lab / waveorder

Wave optical models and inverse algorithms for label-agnostic imaging of density & orientation.

BSD 3-Clause "New" or "Revised" License

15 stars 4 forks source link

Tests fail on macbook with incomplete MPS support #154

Open talonchandler opened 7 months ago

talonchandler commented 7 months ago

The first set of tests test_correction.py fail when I run them locally with:

E       NotImplementedError: The operator 'aten::linalg_lstsq.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Seems like imperfect support for macbook gpus in torch. Related #150.

talonchandler commented 7 months ago

I just confirmed that export PYTORCH_ENABLE_MPS_FALLBACK=1 fixes the correction tests, but one of the mps tests in stokes now fails:

FAILED tests/test_stokes.py::test_copying[mps] - torch._C._LinAlgError: linalg.inv: (Batch element 0): The diagonal element 4 is zero, the inversion could not be completed becaus...

@ziw-liu what do you think is best here? If we're still supporting macbooks then we should add macOS to the tests.

ziw-liu commented 7 months ago

I think we should only officially support x86_64 CPUs and NVIDIA GPUs. MPS backend in torch is experimental and is expected to break (thus the test is disabled in CI). That being said the partial GPU support is not documented and maybe shouldn't be until we reach feature parity for the deconvolution models.

ziw-liu commented 7 months ago

GH do have M1 CI runners now. So when MPS becomes mature we can enable the tests again.

ziw-liu commented 7 months ago

But with PYTORCH_ENABLE_MPS_FALLBACK=1 I can get all tests to pass (on main). @talonchandler can you make a new environment and test again.

talonchandler commented 7 months ago

Tried with a new environment and the tests pass. Thanks for testing @ziw-liu!

I've just opened #156 as an interim documentation fix.

I think we should only officially support x86_64 CPUs and NVIDIA GPUs. MPS backend in torch is experimental and is expected to break (thus the test is disabled in CI).

Sounds good to me. Should we leave this issue open until we get full MPS support and complete the GPU support?

ziw-liu commented 3 months ago

GH do have M1 CI runners now. So when MPS becomes mature we can enable the tests again.

Actually MPS is not available in VMs due to Apple being Apple so CI won't be feasible...