About the description of allowLowPrecisionAccumulationOnGPU

Hi.

In https://github.com/hollance/neural-engine/blob/master/docs/16-bit.md, you wrote

"On the GPU it uses float16 for the weights and the intermediate tensors, but float32 for the calculations. You can turn this off with the option allowLowPrecisionAccumulationOnGPU from MLModelConfiguration, in which case the GPU also uses float16 for the calculations. This is a bit faster but you may lose precision."

Do you have any reference for this description?

In WWDC 19 (https://developer.apple.com/videos/play/wwdc2019/704/ (39:00)), they said,

"And the idea here is that if your model is learning on the GPU, instead of doing accumulation in float32, that happens in float60."

So I guess that this option may be effective only for macOS and the change is from float60 to float32.

I tried the option on iOS device without Neural Engine, but it seemed no speed enhancement.

Thanks.

hollance / neural-engine

About the description of allowLowPrecisionAccumulationOnGPU #8