Open cgr71ii opened 3 years ago
Might be a warm-up issue. Take a look at this explanation from @stevenlix: https://github.com/microsoft/onnxruntime/issues/7230#issuecomment-814619248
This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the bug I am trying to improve the performance of the LASER model, but once I do all the necessary steps to be able to run it with TensorRT, it is very slow.
Urgency High (project-related timelines: 08/2021)
System information
Other:
To Reproduce. In the whole procedure, I am using a conda environment.
onnxruntime
git clone --recursive https://github.com/Microsoft/onnxruntime cd onnxruntime ./build.sh --parallel --build --update --config Release --cuda_home /usr/local/cuda --cudnn_home /usr/local/cuda/lib64 --tensorrt_home /home/cgarcia/Documentos/tensorrt/TensorRT-7.2.3.4 --use_tensorrt --build_wheel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) --cuda_version=11.4 --enable_pybind pip install ./build/Linux/Release/dist/onnxruntime_gpu_tensorrt-1.8.0-cp36-cp36m-linux_x86_64.whl cd ..
Check
python -c "import tensorrt" # ok python -c "import onnxruntime" # ok
PS: I might have forgotten some dependencies in order to make all the previous steps working or some scripts. If needed, I will update the issue with whatever that is needed.
When I execute the model with CPU, CUDA or TensorRT provider, the results are the expected, but the times are not. The times with CPU and CUDA are similar to the corresponding version of pytorch, but with TensorRT does not work as expected in terms of time.
Expected behavior Be, at least, as fast as the CUDA provider.
Additional context I also have tried to use the script
symbolic_shape_infer.py
, but the same result remains:When executing the task with CUDA provider, the time it takes is ~30min, but with TensorRT I have not even be able to finish it (e.g. 30 sentences takes, the first inference, ~5 mins, but the rest takes ~10 mins). For instance, the file
bucc2018.fr-en.train.txt.en
contains 369810 lines, and the task processes 4 pairs of languages with a total of 2657641 sentences (i.e. 2657641 embeddings to be generated).Files Could not attach the files, so I uploaded to drive: https://drive.google.com/file/d/1J7masfUv6Wt4QrBHoWpbbDHkDvgM7GyS/view?usp=sharing