Open bhaswa opened 1 year ago
a accuracy gap of 5% is found in the exported onnx model
Could you identify the wave files that cause inconsistent recognition results?
If yes, could you use one of them to compute the encoder output and compare whether the encoder output is the same for icefall and sherpa-onnx
?
Btw, I calculated the accuracy of onnx model using ./zipformer/onnx_pretrained-streaming.py, not sherpa-onnx.
Btw, I calculated the accuracy of onnx model using ./zipformer/onnx_pretrained-streaming.py, not sherpa-onnx.
That is also ok. It is much easier to get the encoder output with /zipformer/onnx_pretrained-streaming.py
.
@csukuangfj output from the encoder layer is not matching. I checked it for two audios, for one audio recognition result is same and in other audio recognition result is different. Both the cases encoder output is not matching.
@csukuangfj Any update on this ?
output from the encoder layer is not matching
How large is the difference? If the input is the same, the encoder output should also be the same within some numeric tolerance.
I double checked the output. Outputs are completely different from the encoder layer for . Infact the dimensions are not matching.
Dimension for pth: 1 x 16 x 256 Dimension for onnx: 1 x 16 x 512
I double checked the output. Outputs are completely different from the encoder layer for . Infact the dimensions are not matching.
Dimension for pth: 1 x 16 x 256
Dimension for onnx: 1 x 16 x 512
Please apply joiner.ecoder_proj layer to the one whose dim is 512.
The ONNX version invokes joiner.ecoder_proj automatically.
I double checked the output. Outputs are completely different from the encoder layer for . Infact the dimensions are not matching.
Dimension for pth: 1 x 16 x 256
Dimension for onnx: 1 x 16 x 512
Please apply joiner.ecoder_proj layer to the output of PyTorch.
The ONNX version invokes joiner.ecoder_proj automatically.
After applying the joiner.ecoder_proj layer after encoder layer, now dimension is matching, but values are still different.
but values are still different.
How large is the difference? You can use (a - b).abs().max()
to get the max difference.
the number of times encoder is called in pth inference is different from onnx inference. all streaming codes are use FYI.
for a 0.5 sec audio pth calls encoder 2 times whereas onnx it is called only 1 time.
Hi, I have trained latest streaming zipformer model with custom dataset and exported the model to onnx. When I compare the output from original pth model and the onnx model, a accuracy gap of 5% is found in the exported onnx model.