triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
7.72k stars 1.42k forks source link

Triton ensemble not working as expected to support reshape #7355

Open wzhongyuan opened 3 weeks ago

wzhongyuan commented 3 weeks ago

Description

Hi Team,

I tried to config my ensemble model with reshape : https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_configuration.html#reshape, which is not working as expected.

For the ensemble model, I have two models : one Python as preprocessor and one onnx mode. Below is the generated config file for each including the ensemble one:

python preprocessor

name: "pre"
backend: "python"
max_batch_size: 8
input {
  name: "text"
  data_type: TYPE_STRING
  dims: 1
  reshape {
  }
}
output {
  name: "input_ids"
  data_type: TYPE_INT64
  dims: -1
}
output {
  name: "attention_mask"
  data_type: TYPE_INT64
  dims: -1
}
output {
  name: "token_type_ids"
  data_type: TYPE_INT64
  dims: -1
}
dynamic_batching {
  max_queue_delay_microseconds: 2000
}
instance_group {
  count: 4
}

Model

name: "main_app"
platform: "onnxruntime_onnx"
backend: "onnxruntime"
max_batch_size: 8
input {
  name: "input_ids"
  data_type: TYPE_INT64
  dims: -1
}
input {
  name: "attention_mask"
  data_type: TYPE_INT64
  dims: -1
}
input {
  name: "token_type_ids"
  data_type: TYPE_INT64
  dims: -1
}
output {
  name: "embedding"
  data_type: TYPE_FP32
  dims: 768
}
dynamic_batching {
  max_queue_delay_microseconds: 2000
}
instance_group {
  count: 4
}

The ensemble

name: "ensemble"
platform: "ensemble"
max_batch_size: 8
input {
  name: "text"
  data_type: TYPE_STRING
  dims: 1
  reshape {
  }
}
output {
  name: "embedding"
  data_type: TYPE_FP32
  dims: 768
}
ensemble_scheduling {
  step {
    model_name: "pre"
    model_version: -1
    input_map {
      key: "text"
      value: "text"
    }
    output_map {
      key: "token_type_ids"
      value: "token_type_ids"
    }
    output_map {
      key: "input_ids"
      value: "input_ids"
    }
    output_map {
      key: "attention_mask"
      value: "attention_mask"
    }
  }
  step {
    model_name: "main_app"
    model_version: -1
    input_map {
      key: "token_type_ids"
      value: "token_type_ids"
    }
    input_map {
      key: "input_ids"
      value: "input_ids"
    }
    input_map {
      key: "attention_mask"
      value: "attention_mask"
    }
    output_map {
      key: "embedding"
      value: "embedding"
    }
  }
}

We can see from the config that both ensemble and preprocessor has reshape set. However, when I started the Triton server, I got below error

E0617 02:43:08.069270 106905 model_repository_manager.cc:563] Invalid argument: in ensemble ensemble, ensemble tensor text: inconsistent shape: [-1] is inferred from model ensemble while [-1,1] is inferred from model pre

Could you please help check and advise what's the issue and how we can address it? thanks

Triton Information What version of Triton are you using? 23.01 Are you using the Triton container or did you build it yourself? Triton container To Reproduce The above config showed it

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior A clear and concise description of what you expected to happen.

harryskim commented 2 weeks ago

@statiraju for visibility