Closed huanyingjun closed 7 months ago
All input buffers of fused_matmul5_add_add2_add2
are not model weights, which is expected to be non-quantized
@Hzfengsy
lv20 and lv21 are model weights and already been quantized
lv20 && lv21 -> lv22
lv22 is the input of fused_matmul5_add_add2_add2
Then, is it possible merge fused_decode6 and fused_matmul5_add_add2_add2 ?
Thanks for pointing it out. In this case, you are right, we need to fuse decode and matmul into one kernel
Closing this due to inactivity.
Dear I use below command to build the model:
then I check mod_build_stage.py: in the decode function:
you can see that for "fused_matmul5_add_add2_add2" is not quantized