Utilize GPU in the prediction phase

imxieyi / waifu2x-mac

Waifu2x-ios port to macOS, still in Core ML and Metal

MIT License

439 stars 45 forks source link

Utilize GPU in the prediction phase #28

Closed SForeKeeper closed 2 years ago

SForeKeeper commented 2 years ago

Hi, it seems that this app only utilizes Apple Neural Engine, while leaving all GPUs unused in the prediction phase.

Since all m1 processors have the same amount of neural engines, using M1 Max will not make this program any faster. Is it possible to use all GPUs to accelerate the predicting phase?

Thanks.

imxieyi commented 2 years ago

ANE is significantly faster and more efficient than normal GPU. I don't think using GPU instead of ANE will make it faster.

You can try modifying code for creating MLModel here to set computeUnits to cpuAndGPU and see if the M1 Max GPU can be faster than the ANE.

SForeKeeper commented 2 years ago

It's not any faster indeed. Actually it costs nearly the same time on the 32-core GPU, compared to 16-core ANE.

It seems that Apple doesn't provide a configuration that utilizes both. And I'm uncertain what will it be on other models like Real-ESRGAN in the iOS version.

I wonder if batch processing could be implemented, if ANE and GPU cannot be used when processing the same image, would it be possible to process multiple images at the same time, using all computing resources?

imxieyi commented 2 years ago

If Core ML framework has an internal lock that prevents multiple predictions to be executed at the same time, then no. But it should be possible using Core ML for ANE and MPSCNN for GPU at the same time. The only issue is that the whole ML graph must be constructed by hand in MPSCNN, which will be a huge pain to do especially with newer models that are way more complicated than SRCNN.

If you have downloaded the Mac Catalyst version of waifu2x-ios, you can try to write a script (for example using multiprocessing in Python) to launch multiple waifu2x instances. Using --srcnn-mps can force it to use MPSCNN for SRCNN. But unfortunately only SRCNN supports both Core ML and MPSCNN at the moment.

imxieyi commented 2 years ago

The latest version (6.1) of waifu2x-ios on the Mac App Store now supports utilizing ANE and GPU at the same time to accelerate batch processing. You can download this version and have a try if you want.

SForeKeeper commented 2 years ago

Downloaded and tested, thanks for making this true. Though in the benchmark it seems that the 32-core GPU consumes significantly more energy and memory, and still getting outperformed by the ANE.

Here's my benchmark result, hope it would benefit your future development.

imxieyi commented 2 years ago

Yes the result definitely helps. Looks like M1's architecture scales very linearly. I got about 70 on 8-core M1 and 140 on 16-core M1 Pro. The ANE score on all 3 CPUs are the same.

The results from the benchmark is theoretical maximum performance though. It feeds fake data as fast as possible to Core ML. In actual image processing it will not be as fast due to many additional processing.

nevertoday commented 1 year ago

我的m2 芯片上总卡住，请问这是为什么？