Bind Evaluator interface to pymarian

thammegowda commented 1 year ago

Description

List of changes:

replaced skbuild with skbuild-core, the next gen build system. replaced setup.py with pyproject.toml (setup.py is deprecated)
Revised pymarian code and added evaluator interface. split pymarian.h -> translator + evaluator .hpp files
Add BufferedVectorCollector to access scores in memory without i/o
Reorg pymarian dir into tests and examples
add evaluator example script that downloads metrics from our blob storage (publicly accessible)
configured CLI executables: pymarian-evaluate, pymarian-qtdemo, pymarian-mtapi

Added dependencies: none

How to test

These instructions are added to README in src/python.

git checkout tg/pybind-new
# build and install -- along with optional dependencies for demos
# run this from root of project, i.e., dir with pyproject.toml
pip install -v .[demos]   

# using a specific version of compiler (e.g., gcc-9 g++-9)
CMAKE_ARGS="-DCMAKE_C_COMPILER=gcc-9 -DCMAKE_CXX_COMPILER=g++-9" pip install -v .[demos]

# with CUDA on
CMAKE_ARGS="-DCOMPILE_CUDA=ON" pip install . 

# with a specific version of cuda toolkit, e,g. cuda 11.5
CMAKE_ARGS="-DCOMPILE_CUDA=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.5" pip install -v .[demos]

Example Usage

# download sample dataset
langs=en-ru
prefix=tmp.$langs
teset=wmt21/systems
sysname=Online-B
sacrebleu -t $teset -l $langs --echo src > $prefix.src
sacrebleu -t $teset -l $langs --echo ref > $prefix.ref
sacrebleu -t $teset -l $langs --echo $sysname > $prefix.mt

# chrfoid
paste $prefix.{src,mt} | head | pymarian-evaluate --stdin -m chrfoid-wmt23 

# cometoid22-wmt{21,22,23}
paste $prefix.{src,mt} | head | pymarian-evaluate --stdin -m cometoid22-wmt22

# bleurt20
paste $prefix.{ref,mt} | head | pymarian-evaluate --stdin  -m bleurt20 --debug

`mtapi`

Launch server

# example model: download and extract
wget http://data.statmt.org/romang/marian-regression-tests/models/wngt19.tar.gz 
tar xvf wngt19.tar.gz 

# launch server
pymarian-mtapi -s en -t de "-m wngt19/model.base.npz -v wngt19/en-de.spm wngt19/en-de.spm"

Example request from client

 URL="http://127.0.0.1:5000/translate"
 curl $URL --header "Content-Type: application/json" --request POST --data '[{"text":["Good Morning."]}]'

QtDemo

pymnarian-qt

Checklist

[x] I have tested the code manually
[ ] I have run regression tests
[ ] I have read and followed CONTRIBUTING.md
[ ] I have updated CHANGELOG.md

thammegowda commented 12 months ago

@mjpost updated instructions for testing these changes.

thammegowda commented 11 months ago

There seems to be a problem with multi-gpu usage with pymarian. Model gets loaded to all the requested gpu devices, but only the first GPU gets used for inference.

How to reproduce: terminal1: paste tmp.{src,mt} | pymarian-evaluate --stdin -m chrfoid-wmt23 -d 0 1 2 3

terminal2: watch usage: gpustat -cup -i 1

thammegowda commented 11 months ago

Fixed it. Since there is no iterator support at the mment, we have minibatches made in python (to avoid buffering all scores in memory and the waiting until the end). The batch_size in python was set too small (mini_batch) so only one GPU was utilized. Fixed it by setting batch_size=mini_batch*maxi_batch TODO: support passing of iterators between python and c++ so we can eliminate minibatching in python

thammegowda commented 7 months ago

Closing since we have merged these changes in Azure DevOps fork!

marian-nmt / marian-dev