"On the GPU it uses float16 for the weights and the intermediate tensors, but float32 for the calculations. You can turn this off with the option allowLowPrecisionAccumulationOnGPU from MLModelConfiguration, in which case the GPU also uses float16 for the calculations. This is a bit faster but you may lose precision."
Hi.
In https://github.com/hollance/neural-engine/blob/master/docs/16-bit.md, you wrote
Do you have any reference for this description?
In WWDC 19 (https://developer.apple.com/videos/play/wwdc2019/704/ (39:00)), they said,
So I guess that this option may be effective only for macOS and the change is from float60 to float32.
I tried the option on iOS device without Neural Engine, but it seemed no speed enhancement.
Thanks.