ufal / lindat-translation

Frontend of LINDAT translation service
https://lindat.mff.cuni.cz/services/translation
BSD 2-Clause "Simplified" License
25 stars 5 forks source link
machine-translation tensorflow

transformer_frontend

A simple flask based ui in front of tensorflow serving.

In our setup the flask app runs on a machine called transformer and tensorflow serving on another machine called t2t-transformer.

Install

To install and run frontend

git clone --recurse-submodules git@github.com:ufal/transformer_frontend
pip install -r requirements.txt
gunicorn -t 500 -k sync -w 12 -b 0.0.0.0:5000 uwsgi:app

systemd configs are provided in order to run as a system service, sample docker (see Dockerfile, docker-compose.yml) configuration is provided for testing. Both need tweaking.

Serving

The easiest but probably suboptimal (you likely want to compile yourself) way is to follow https://www.tensorflow.org/serving/setup and get a .deb package. There's also a docker image, we use that in the sample setup (see docker-compose.yml), but you'll need to provide a model and set a proper path to it

Serving build

The nvidia driver version we use is 440.33.01

The following describes how I've managed to build version 2.1.0 (2.3.0 gives SIGSEGV when you send any data, 2.2.0 had some strange startup times and didn't respond on REST-api)

The official "documentation" is https://github.com/tensorflow/serving/blob/2.1.0/tensorflow_serving/tools/docker/Dockerfile.devel-gpu

There is a compatibility matrix at https://www.tensorflow.org/install/source#gpu it diverges from what gets pushed onto dockerhub (https://hub.docker.com/r/tensorflow/serving/tags)

install bazelisk from https://github.com/bazelbuild/bazelisk

export USE_BAZEL_VERSION=0.24.1

version 2.1.0 is built from git checkout d83512c6 of (https://github.com/tensorflow/serving) The python virtualenv contains the following packages, mind especially the numpy version

certifi==2020.6.20
chardet==3.0.4
future==0.18.2
grpcio==1.32.0
h5py==2.10.0
idna==2.10
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
mock==4.0.2
numpy==1.18.5
pkg-resources==0.0.0
requests==2.24.0
six==1.15.0
urllib3==1.25.10

The following command sets the necessary variables and paths to run the build

TMP=/tmp CUDA_VISIBLE_DEVICES=0 TF_NCCL_VERSION= TF_NEED_CUDA=1 TF_NEED_TENSORRT=1 TENSORRT_INSTALL_PATH=/home/okosarko/junk/TensorRT-5.1.5.0/ TF_CUDA_VERSION=10.0 TF_CUDNN_VERSION=7 CUDNN_INSTALL_PATH=/opt/cuda/10.0/cudnn/7.6/ LD_LIBRARY_PATH=/opt/cuda/10.0/lib64/stubs:/opt/cuda/10.0/extras/CUPTI/lib64:/opt/cuda/10.0/lib64:/opt/cuda/10.0/cudnn/7.6/lib64/:/usr/include/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu PYTHONPATH=/mnt/transformers-shared/venv/lib/python3.6/site-packages bazelisk build --color=yes --curses=yes --config=cuda --config=nativeopt --config=release --copt=-fPIC --verbose_failures --output_filter=DONT_MATCH_ANYTHING --action_env PYTHON_BIN_PATH=/mnt/transformers-shared/venv/bin/python tensorflow_serving/model_servers:tensorflow_model_server

You can then copy bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server elsewhere and run it with appropriate LD_LIBRARY_PATH. To clean the build artifacts bazelisk clean --expunge.

There are some test models provided, to test: (based on https://github.com/tensorflow/serving/blob/master/tensorflow_serving/g3doc/docker.md#tensorflow-serving-with-docker)

tensorflow_model_server --model_base_path=/home/okosarko/tensorflow-serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu/ --model_name=half_plus_two --rest_api_port=8501
curl -d '{"instances": [1.0, 2.0, 5.0]}' http://localhost:8501/v1/models/half_plus_two:predict

Few steps back you can also test cudnn (samples are in deb package downloadable separtely from src, use dpkg -x to unpack) and TensorRT

Configs

There are several config files:

Adding new model

Assume we have two machines flask and gpu

  1. see scripts/export.sh or https://github.com/tensorflow/tensor2tensor/blob/ae042f66e013494eb2c4c2b50963da5a3d3fc828/tensor2tensor/serving/README.md#1-export-for-serving , but set the appropriate params. Pick a name ($MODEL)
  2. update app/models.json appropriately, model is $MODEL (this needs to be on flask)
  3. add dictionary to t2t_data_dir (this needs to be on flask)
  4. update model.config, name is $MODEL (this lives on gpu, the systemd scripts expects that file in /opt/lindat_tranformer_service)
  5. restart both - sudo systemctl restart tensorflow_serving, sudo systemctl restart transformer
  6. check serving logs for oom errors sudo journalctl -f -u tensorflow_serving; if you see them before translating anything, search for a way to dynamically swap the models; if you see them when translating you might try fiddling with batching.config