Closed SForeKeeper closed 2 years ago
ANE is significantly faster and more efficient than normal GPU. I don't think using GPU instead of ANE will make it faster.
You can try modifying code for creating MLModel
here to set computeUnits
to cpuAndGPU
and see if the M1 Max GPU can be faster than the ANE.
It's not any faster indeed. Actually it costs nearly the same time on the 32-core GPU, compared to 16-core ANE.
It seems that Apple doesn't provide a configuration that utilizes both. And I'm uncertain what will it be on other models like Real-ESRGAN in the iOS version.
I wonder if batch processing could be implemented, if ANE and GPU cannot be used when processing the same image, would it be possible to process multiple images at the same time, using all computing resources?
If Core ML framework has an internal lock that prevents multiple predictions to be executed at the same time, then no. But it should be possible using Core ML for ANE and MPSCNN for GPU at the same time. The only issue is that the whole ML graph must be constructed by hand in MPSCNN, which will be a huge pain to do especially with newer models that are way more complicated than SRCNN.
If you have downloaded the Mac Catalyst version of waifu2x-ios, you can try to write a script (for example using multiprocessing
in Python) to launch multiple waifu2x instances. Using --srcnn-mps
can force it to use MPSCNN for SRCNN. But unfortunately only SRCNN supports both Core ML and MPSCNN at the moment.
The latest version (6.1) of waifu2x-ios on the Mac App Store now supports utilizing ANE and GPU at the same time to accelerate batch processing. You can download this version and have a try if you want.
Downloaded and tested, thanks for making this true. Though in the benchmark it seems that the 32-core GPU consumes significantly more energy and memory, and still getting outperformed by the ANE.
Here's my benchmark result, hope it would benefit your future development.
Yes the result definitely helps. Looks like M1's architecture scales very linearly. I got about 70 on 8-core M1 and 140 on 16-core M1 Pro. The ANE score on all 3 CPUs are the same.
The results from the benchmark is theoretical maximum performance though. It feeds fake data as fast as possible to Core ML. In actual image processing it will not be as fast due to many additional processing.
我的m2 芯片上总卡住,请问这是为什么?
Hi, it seems that this app only utilizes Apple Neural Engine, while leaving all GPUs unused in the prediction phase.
Since all m1 processors have the same amount of neural engines, using M1 Max will not make this program any faster. Is it possible to use all GPUs to accelerate the predicting phase?
Thanks.