agemagician commented 3 years ago

Ask a Question

Question

I have converted Bert Model from transformers to onnx. When I use single sample per batch, the results matches with the original Pytorch model. However, when I predict the output of many samples per batch, the results are totally different than the original Pytorch model. Is there anything that we need to modify to make onnx predict batch of samples, or it can only work with a single sample per batch ?

Further information

Relevant Area (prediction):
Is this issue related to a specific model?
Model name (Bert):
Model opset (12):

Notes

Code example: https://github.com/agemagician/ProtTrans/blob/master/Embedding/Onnx/ProtBert-BFD.ipynb

tianleiwu commented 3 years ago

@agemagician, did you solve the issue?

We have used batch of samples for BERT model and did not encounter such problem.

In batch mode, usually attention mask is used since some sequences have padding. In comparison, output for those padding (masked) tokens shall be excluded.

agemagician commented 3 years ago

@tianleiwu , thankns for you reply.

Unfortunately, not.

In my code example, I am sending both the "input_ids" and "attention_mask" to onnx inference model.

Could you check my code and let me know what went wrong, or maybe you can provide me a colab example with and without batch processing that provides the same results.

microsoft / onnxruntime

Onnx Batch Processing #6044

Ask a Question

Question

Further information

Notes