Open Septend-fun opened 2 months ago
Hi, I have another question about the NPU latency. I got results when I tested matmul op.
If batch=32, inC=4096, outC=11008, the latency is 16.58ms;
If batch=32, inC=11008, outC=4096, the latency is 2.3ms;
I think these two cases have similar FLOPS and IO. Why do they have so big diff?
Hi, I have another question about the NPU latency. I got results when I tested matmul op.
If batch=32, inC=4096, outC=11008, the latency is 16.58ms; If batch=32, inC=11008, outC=4096, the latency is 2.3ms;
I think these two cases have similar FLOPS and IO. Why do they have so big diff?
Sorry I cannot reproduce this behavior.
Also, Op support is ongoing so stay tuned for new operations and dtypes to come
Thanks for your reply. So in your test, you got the similar results, right? It may be caused by my environment, I'll check it.
Any update? I'm happy to help if you need it. Otherwise I'll close the issue
Hi, experts. It seems that the matmul (with weight and input-tensor's dtype both int8) is not supported right? I must convert weight to fp16 when using matmul op.
The key is in
src/bindings.cpp
If I set
act_dtype
dtype as int8, then I will get this error:Matmul op #0 must be ranked tensor of 16 bit float or 32 bit float or 32 bit int , but got tensor<1x16x16xsi8>
It is probably caused by openvino, but I think the NPU supportsint8 x int8
op right?