Closed thammegowda closed 7 months ago
@mjpost updated instructions for testing these changes.
There seems to be a problem with multi-gpu usage with pymarian. Model gets loaded to all the requested gpu devices, but only the first GPU gets used for inference.
How to reproduce:
terminal1: paste tmp.{src,mt} | pymarian-evaluate --stdin -m chrfoid-wmt23 -d 0 1 2 3
terminal2: watch usage: gpustat -cup -i 1
Fixed it. Since there is no iterator support at the mment, we have minibatches made in python (to avoid buffering all scores in memory and the waiting until the end).
The batch_size in python was set too small (mini_batch) so only one GPU was utilized. Fixed it by setting batch_size=mini_batch*maxi_batch
TODO: support passing of iterators between python and c++ so we can eliminate minibatching in python
Closing since we have merged these changes in Azure DevOps fork!
Description
List of changes:
replaced skbuild with skbuild-core, the next gen build system. replaced setup.py with pyproject.toml (setup.py is deprecated)
Revised pymarian code and added evaluator interface. split pymarian.h -> translator + evaluator .hpp files
Add BufferedVectorCollector to access scores in memory without i/o
Reorg pymarian dir into tests and examples
add evaluator example script that downloads metrics from our blob storage (publicly accessible)
configured CLI executables: pymarian-evaluate, pymarian-qtdemo, pymarian-mtapi
Added dependencies: none
How to test
These instructions are added to README in src/python.
Example Usage
mtapi
Launch server
Example request from client
QtDemo
Checklist