cgr71ii commented 3 years ago

Describe the bug I am trying to improve the performance of the LASER model, but once I do all the necessary steps to be able to run it with TensorRT, it is very slow.

Urgency High (project-related timelines: 08/2021)

System information

OS Platform and Distribution: Linux Ubuntu 20.04
ONNX Runtime installed from (source or binary): source
ONNX Runtime version: https://github.com/microsoft/onnxruntime/commit/27d1784d44826e1f1b58b76dadce88617e5f1f4d
Python version: 3.6.3
GCC/Compiler version (if compiling from source): 9.3.0-17ubuntu1~20.04
CUDA/cuDNN version: 11.4/8.2.1
GPU model and memory: A100 (40 GiB)

Other:

pytorch version: 1.8.1

To Reproduce. In the whole procedure, I am using a conda environment.

Install TensorRT and onnxruntime from source (if not, we will not have TensorRT support):


# TensorRT
## Download TensorRT from https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/7.2.2/tars/TensorRT-7.2.2.3.Ubuntu-18.04.x86_64-gnu.cuda-11.1.cudnn8.0.tar.gz
## Following steps from https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-tar

onnxruntime

git clone --recursive https://github.com/Microsoft/onnxruntime cd onnxruntime ./build.sh --parallel --build --update --config Release --cuda_home /usr/local/cuda --cudnn_home /usr/local/cuda/lib64 --tensorrt_home /home/cgarcia/Documentos/tensorrt/TensorRT-7.2.3.4 --use_tensorrt --build_wheel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) --cuda_version=11.4 --enable_pybind pip install ./build/Linux/Release/dist/onnxruntime_gpu_tensorrt-1.8.0-cp36-cp36m-linux_x86_64.whl cd ..

Check

python -c "import tensorrt" # ok python -c "import onnxruntime" # ok

2. Modify LASER model to be able to export it to ONNX from pytorch (file `embed_opt.py` or file `laser.onnx`)
3. Download [LASER](https://github.com/facebookresearch/LASER) and install dependencies
4. Download BUCC data ([instructions from LASER](https://github.com/facebookresearch/LASER/tree/master/tasks/bucc))
5. Copy the upload `embed_opt.py` to `$LASER/source` (it will be executed instead of the original `embed.py` by `bucc.sh`). Use the upload `bucc.sh` in `$LASER/tasks/bucc/bucc.sh` instead of the original. Copy the upload `embedding_util.py` to `$LASER/source`.
6. Modify, if necessary, line 305, col 114, of `$LASER/source/embed_opt.py` in order to adapt the amount of GPU (I used 15GiB, but it may be adapted to ~5GiB or even less).
7. Execute:
```bash
# You will need to modify the path to your TensorRT directory
export LD_LIBRARY_PATH="/home/cgarcia/Documentos/tensorrt/TensorRT-7.2.3.4/lib"

cd $LASER/tasks/bucc
# laser.onnx will be generated if not provided (it has been upload with xz compression, which it will be needed to decompress to use or check with netron)
/usr/bin/time -v ./bucc.sh "--run-ort --ort-model $(pwd)/laser.onnx"

PS: I might have forgotten some dependencies in order to make all the previous steps working or some scripts. If needed, I will update the issue with whatever that is needed.

When I execute the model with CPU, CUDA or TensorRT provider, the results are the expected, but the times are not. The times with CPU and CUDA are similar to the corresponding version of pytorch, but with TensorRT does not work as expected in terms of time.

Expected behavior Be, at least, as fast as the CUDA provider.

Additional context I also have tried to use the script symbolic_shape_infer.py, but the same result remains:

python ./onnxruntime/onnxruntime/python/tools/symbolic_shape_infer.py --input /home/cgarcia/Documentos/LASER/tasks/bucc/laser.onnx --output /home/cgarcia/Documentos/LASER/tasks/bucc/new_laser.onnx --auto_merge --verbose 3

When executing the task with CUDA provider, the time it takes is ~30min, but with TensorRT I have not even be able to finish it (e.g. 30 sentences takes, the first inference, ~5 mins, but the rest takes ~10 mins). For instance, the file bucc2018.fr-en.train.txt.en contains 369810 lines, and the task processes 4 pairs of languages with a total of 2657641 sentences (i.e. 2657641 embeddings to be generated).

Files Could not attach the files, so I uploaded to drive: https://drive.google.com/file/d/1J7masfUv6Wt4QrBHoWpbbDHkDvgM7GyS/view?usp=sharing

romank87 commented 3 years ago

Might be a warm-up issue. Take a look at this explanation from @stevenlix: https://github.com/microsoft/onnxruntime/issues/7230#issuecomment-814619248

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

microsoft / onnxruntime

Awful performance with LASER model when using TensorRT provider #8315

onnxruntime

Check