Model quantization (using int8's instead of floats for faster inference) is all the rage these days, it seems. The Oracle Devs AlphaZero blog post series writes extensively about how this improved inference throughput (4x they claim).
We should experiment with this. I have minimal familiarity with this technique.
Model quantization (using int8's instead of floats for faster inference) is all the rage these days, it seems. The Oracle Devs AlphaZero blog post series writes extensively about how this improved inference throughput (4x they claim).
We should experiment with this. I have minimal familiarity with this technique.