I'm using this on an Apple Silicon device (M1 Max) and noticed that it fails to use MPS. It seems the issue comes from the amp.autocast call, which is hardcoded to CUDA here. While there is no M1 support for AMP yet, changing it to use "cpu" in this spot seems to fix it and get torch running on MPS anyways instead of falling back to CPU on Apple Silicon.
Let me know if there's anything else you'd like, if this is of interest to merge! I'm just tinkering at this point and will eventually move to a linux-based production deployment anyways so this isn't really mission-critical.
Hello!
I'm using this on an Apple Silicon device (M1 Max) and noticed that it fails to use MPS. It seems the issue comes from the amp.autocast call, which is hardcoded to CUDA here. While there is no M1 support for AMP yet, changing it to use "cpu" in this spot seems to fix it and get torch running on MPS anyways instead of falling back to CPU on Apple Silicon.
Let me know if there's anything else you'd like, if this is of interest to merge! I'm just tinkering at this point and will eventually move to a linux-based production deployment anyways so this isn't really mission-critical.