microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.87k stars 2.94k forks source link

Onnx Batch Processing #6044

Open agemagician opened 3 years ago

agemagician commented 3 years ago

Ask a Question

Question

I have converted Bert Model from transformers to onnx. When I use single sample per batch, the results matches with the original Pytorch model. However, when I predict the output of many samples per batch, the results are totally different than the original Pytorch model. Is there anything that we need to modify to make onnx predict batch of samples, or it can only work with a single sample per batch ?

Further information

Notes

Code example: https://github.com/agemagician/ProtTrans/blob/master/Embedding/Onnx/ProtBert-BFD.ipynb

tianleiwu commented 3 years ago

@agemagician, did you solve the issue?

We have used batch of samples for BERT model and did not encounter such problem.

In batch mode, usually attention mask is used since some sequences have padding. In comparison, output for those padding (masked) tokens shall be excluded.

agemagician commented 3 years ago

@tianleiwu , thankns for you reply.

Unfortunately, not.

In my code example, I am sending both the "input_ids" and "attention_mask" to onnx inference model.

Could you check my code and let me know what went wrong, or maybe you can provide me a colab example with and without batch processing that provides the same results.