triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.12k stars 1.46k forks source link

transformer model output mismatch #6042

Closed riyaj8888 closed 1 year ago

riyaj8888 commented 1 year ago

Description After deploying transformer model using triton inference server i am getting different output from the local copy of the same model.

Triton Information 23.03-py

Are you using the Triton container or did you build it yourself? triton container To Reproduce Steps to reproduce the behavior.

  1. convert hugging face xlm-roberta model to onnx format.
  2. create config.pbtxt
  3. deploy using tritonserver

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well). name: "preprocess-xlmroberta"

backend: "python"

max_batch_size: 1

input [ { name: "text" data_type: TYPE_STRING dims: [-1] } ]

output [ { name: "output_0" data_type: TYPE_INT64 dims: [-1] }, { name: "output_1" data_type: TYPE_INT64 dims: [-1] } ]

instance_group [ { count: 1 kind: KIND_CPU } ]

Ensemble model config name: "test-model"

platform: "ensemble"

max_batch_size: 1

input [ { name: "text" data_type: TYPE_STRING dims: [1] } ]

output [ { name: "embeddings" data_type: TYPE_FP32 dims: [1] } ] ensemble_scheduling { step [ { model_name: "preprocess-xlmroberta" model_version: 1 input_map { key: "text" value: "text" } output_map { key: "output_0" value: "input_ids" } output_map { key: "output_1" value: "attention_mask" } }, { model_name: "tensorrt-xlmroberta" model_version: 1 input_map { key: "input_ids" value: "input_ids" } input_map { key: "attention_mask" value: "attention_mask" } output_map { key: "output" value: "embeddings" } } ] } Expected behavior I am using onnx format of model , i m expecting output of both model after deployment using triton server and local copy of the same must have same outputs.

Tabrizian commented 1 year ago

The outputs can be different due to indeterministic nature of some of the frameworks/libraries used in your Python model. Please see this section for more information on how to increase the determinism of your model. https://github.com/triton-inference-server/python_backend#frameworks

dyastremsky commented 1 year ago

Closing due to inactivity. If you would like to reopen this issue for follow-up, please let us know.