Open lemousehunter opened 5 months ago
For more context: I am trying to replicate the multi-text embedding generation in a single request. The output of the BGE-m3 is (2, 1024) for a Text input of (2, ). However, the Ensemble model still returns an output of (2048, ) instead (the bge-m3 output is flattened by the forced reshaping).
Description I have specified [-1, 1024] as the output dimensions for my ensemble model, but the output is still reshaped to [1024].
Triton Information NVIDIA Release 24.03 (build 86102629) Triton Server Version 2.44.0
Are you using the Triton container or did you build it yourself? I am using the NGC Triton Container
To Reproduce
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
Backend of last model in ensemble: ONNX Runtime
Expected behavior Expected no reshaping of output since batched output of last model in ensemble has the same dimensions of the specified output of the ensemble model. bge-m3_config.zip