This update fixes two issues for phi models on metal and wasm. One is we previously get NaN on f32 models due to a tanh issue now fixed by https://github.com/apache/tvm/pull/16438. Another is that the matmul of Q K in f16 could overflow and result in INF; we now solve with mixed-precision matmul by accumulating to a f32 buffer instead.
This update fixes two issues for phi models on metal and wasm. One is we previously get NaN on f32 models due to a tanh issue now fixed by https://github.com/apache/tvm/pull/16438. Another is that the matmul of Q K in f16 could overflow and result in INF; we now solve with mixed-precision matmul by accumulating to a f32 buffer instead.