pytriton use onnx is slower than onnx runtime for tiny bert model - Githubissues

triton-inference-server / pytriton

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.

https://triton-inference-server.github.io/pytriton/

Apache License 2.0

719 stars 50 forks source link

pytriton use onnx is slower than onnx runtime for tiny bert model #61

Open yan123456jie opened 8 months ago

yan123456jie commented 8 months ago

My model path is https://huggingface.co/jesseyan/tiny_classify_bert/tree/main

onnx runtime server is https://github.com/yan123456jie/model_speed_test/blob/master/src/onnx/onnx_server.py onnx speed test https://github.com/yan123456jie/model_speed_test/blob/master/src/onnx_web.py

pytriton server is https://github.com/yan123456jie/model_speed_test/blob/master/src/pytriton/pytriton_server.py pytriton speed test https://github.com/yan123456jie/model_speed_test/blob/master/src/pytriton_web.py

my final test result is blow

github-actions[bot] commented 7 months ago

This issue is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.