Note: This project requires an NVIDIA GPU and corresponding cuda libraries
git clone https://github.com/ugermann/mtm17.git mtm17
cd mtm17
git submodule update --init
make marian
make baseline-model
echo 'das Haus ist blau und Blumen wachsen in der Sonne .' \
| marian-dev/build/s2s \
-m model/baseline/model.npz \
-v model/baseline/corpus.bpe.de.json \
model/baseline/corpus.bpe.en.json
Edit environment.rc
to reflect your local setup. In your current bash
shell, run
. environment.rc
to set some environment variable that this setup relies on.
Pipe text through scripts/preprocess.sh de
and scripts/preprocess.sh en
, respectively, to perform
of data in line with the processing that was used to pre-process the original training data
Similarly scripts/postprocess.sh {de|en}
converts decoder output back to 'normal' text.
export MTM17_ROOT=/some/path
sudo apt-get update && sudo apt-get install -y cmake git libboost-dev libeigen3-dev libopenblas-base \
libopenblas-dev python python-dev python-pip gfortran zlib1g-dev g++ automake autoconf \
libtool libboost-all-dev libgoogle-perftools-dev libpcre3-dev
cd ${MTM17_ROOT}
git clone https://github.com/marian-nmt/marian-dev.git
cd marian-dev
git checkout nematus
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=release ..
make -j
cd ${MTM17_ROOT}
https://github.com/marian-nmt/moses-scripts.git
https://github.com/rsennrich/subword-nmt
wget -r -e robots=off -nH -np -R index.html* http://data.statmt.org/mtm17/models/de-en//fs/vali0/www/data.statmt.org/summa/mt/models/de-en/20170620/
Update: Turns out Marian can't handle the 2017 deep models quite yet, so we'll be using the UEdin Models from WMT16.
cd /mt
git clone http:/github.com/marian-nmt/marian-dev.git
cd marian-dev
git checkout nematus
mkdir build
cmake ..
make -j
echo 'das Haus ist blau .' | /mt/marian-dev/build/s2s -c /share/mtm17-ug/s2s-conf.yaml