Open weibingo opened 1 month ago
@weibingo Any chance you have a similar model from huggingface? Are you using optimum-gpu or tensorrt backend? Are you sure tensorrt is correct installed?
@weibingo Any chance you have a similar model from huggingface? Are you using optimum-gpu or tensorrt backend? Are you sure tensorrt is correct installed?
yes。model i use optimum cuda is ok 。 tensorrt env alse have error,but i resolved。 i test the embedder.optimum.py , directly init OptimumEmbedder,and the error exists。 then i look source code, at utils_optimum.py , i find tensorrtExecutionProvider options without trt_cude_graph_enble can work. but i don't understand why can work and if has trt_cude_graph_enble can't work
@weibingo No idea why cuda graph capture does not work. I have not used trt much, it only had marginal performance gains over onnx-gpu.
@michaelfeil so you don't test with engine optimum, device tensorrt ?
@weibingo it’s not possible to test in ci (which is cpu) & i have not used it locally in the last 3 months. Before that, it was extensively tested with 8.6.1
System Info
infinity_emb v2 --model_id /home/xxxx/peg_onnx --served-model-name embedding --engine optimum --device tensorrt --batch-size 32 OS: linux model_base PEG nvidia-smi: cuda version 11.8, tensorrt: 8.6.1
Information
Tasks
Reproduction
1、just startup
Expected behavior
python3.10/dist-packages/optimum/onnxruntime/model_ort.py line 1444, in forward model_outputs = self.__prepare_onnx_outputs(use_torch, **onnx_outputs) python3.10/dist-packages/optimum/onnxruntime/modeling_ort.py line 939 in __prepare_onnx_outputs model_outputs[output_name]=onnx_outputs[idx] IndexError: tuple index out of range
then i print log with model run inputs and outputs , find warmup model , first inference is ok , twice is error if i startup with --no-model-warmup, server can startup , but twice inference also error