marian-nmt / marian-dev

Fast Neural Machine Translation in C++ - development repository
https://marian-nmt.github.io
Other
256 stars 126 forks source link

Bind Evaluator interface to pymarian #1013

Closed thammegowda closed 7 months ago

thammegowda commented 1 year ago

Description

List of changes:

Added dependencies: none

How to test

These instructions are added to README in src/python.

git checkout tg/pybind-new
# build and install -- along with optional dependencies for demos
# run this from root of project, i.e., dir with pyproject.toml
pip install -v .[demos]   

# using a specific version of compiler (e.g., gcc-9 g++-9)
CMAKE_ARGS="-DCMAKE_C_COMPILER=gcc-9 -DCMAKE_CXX_COMPILER=g++-9" pip install -v .[demos]

# with CUDA on
CMAKE_ARGS="-DCOMPILE_CUDA=ON" pip install . 

# with a specific version of cuda toolkit, e,g. cuda 11.5
CMAKE_ARGS="-DCOMPILE_CUDA=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.5" pip install -v .[demos]

Example Usage

# download sample dataset
langs=en-ru
prefix=tmp.$langs
teset=wmt21/systems
sysname=Online-B
sacrebleu -t $teset -l $langs --echo src > $prefix.src
sacrebleu -t $teset -l $langs --echo ref > $prefix.ref
sacrebleu -t $teset -l $langs --echo $sysname > $prefix.mt

# chrfoid
paste $prefix.{src,mt} | head | pymarian-evaluate --stdin -m chrfoid-wmt23 

# cometoid22-wmt{21,22,23}
paste $prefix.{src,mt} | head | pymarian-evaluate --stdin -m cometoid22-wmt22

# bleurt20
paste $prefix.{ref,mt} | head | pymarian-evaluate --stdin  -m bleurt20 --debug

mtapi

Launch server

# example model: download and extract
wget http://data.statmt.org/romang/marian-regression-tests/models/wngt19.tar.gz 
tar xvf wngt19.tar.gz 

# launch server
pymarian-mtapi -s en -t de "-m wngt19/model.base.npz -v wngt19/en-de.spm wngt19/en-de.spm"

Example request from client

 URL="http://127.0.0.1:5000/translate"
 curl $URL --header "Content-Type: application/json" --request POST --data '[{"text":["Good Morning."]}]'

QtDemo

pymnarian-qt

Checklist

thammegowda commented 12 months ago

@mjpost updated instructions for testing these changes.

thammegowda commented 11 months ago

There seems to be a problem with multi-gpu usage with pymarian. Model gets loaded to all the requested gpu devices, but only the first GPU gets used for inference.

How to reproduce: terminal1: paste tmp.{src,mt} | pymarian-evaluate --stdin -m chrfoid-wmt23 -d 0 1 2 3

terminal2: watch usage: gpustat -cup -i 1

thammegowda commented 11 months ago

Fixed it. Since there is no iterator support at the mment, we have minibatches made in python (to avoid buffering all scores in memory and the waiting until the end). The batch_size in python was set too small (mini_batch) so only one GPU was utilized. Fixed it by setting batch_size=mini_batch*maxi_batch TODO: support passing of iterators between python and c++ so we can eliminate minibatching in python

thammegowda commented 7 months ago

Closing since we have merged these changes in Azure DevOps fork!