Open agemagician opened 3 years ago
@agemagician, did you solve the issue?
We have used batch of samples for BERT model and did not encounter such problem.
In batch mode, usually attention mask is used since some sequences have padding. In comparison, output for those padding (masked) tokens shall be excluded.
@tianleiwu , thankns for you reply.
Unfortunately, not.
In my code example, I am sending both the "input_ids" and "attention_mask" to onnx inference model.
Could you check my code and let me know what went wrong, or maybe you can provide me a colab example with and without batch processing that provides the same results.
Ask a Question
Question
I have converted Bert Model from transformers to onnx. When I use single sample per batch, the results matches with the original Pytorch model. However, when I predict the output of many samples per batch, the results are totally different than the original Pytorch model. Is there anything that we need to modify to make onnx predict batch of samples, or it can only work with a single sample per batch ?
Further information
Relevant Area (prediction):
Is this issue related to a specific model?
Model name (Bert):
Model opset (12):
Notes
Code example: https://github.com/agemagician/ProtTrans/blob/master/Embedding/Onnx/ProtBert-BFD.ipynb