Open wacky6 opened 2 years ago
Another factor is download time. IIUC, the current tfjs format (for example) doesn't support float16, and so tfjs-converter converts weights to float32. This isn't ideal because it doubles the model size. I think it makes more sense to always optimistically serve the model in its "native" floating point format and for conversion to be done at run time based on the device's hardware.
Relates to https://github.com/webmachinelearning/webnn/issues/252
Some accelerators use non-standard float point types (e.g. bfloat16 and TF32). They are important to achieve high performance (e.g. by using Nvidia's tensor cores), and/or reduce resource usage (e.g. FP32->FP16 reduces memory usage by half).
How could MLLoader leverage these types? Some ideas: