Closed alvations closed 9 years ago
This is interesting.
Seems like there's some permission problems when I tried to get the training tools through wget:
alvas@ubi:~/test-out-of-box/training-tools$ ls -lah *
-rw-rw-r-- 1 alvas alvas 914K Jan 29 16:16 d4norm
-rw-rw-r-- 1 alvas alvas 919K Jan 29 16:16 hmmnorm
-rw-rw-r-- 1 alvas alvas 2.1K Jan 29 16:16 merge_alignment.py
-rw-rw-r-- 1 alvas alvas 1.1M Jan 29 16:16 mgiza
-rw-rw-r-- 1 alvas alvas 336K Jan 29 16:16 mkcls
-rw-rw-r-- 1 alvas alvas 43K Jan 29 16:16 plain2snt
-rw-rw-r-- 1 alvas alvas 38K Jan 29 16:16 snt2cooc
-rw-rw-r-- 1 alvas alvas 29K Jan 29 16:16 snt2coocrmp
-rw-rw-r-- 1 alvas alvas 33K Jan 29 16:16 snt2plain
-rw-rw-r-- 1 alvas alvas 48K Jan 29 16:16 symal
After I did a chmod 777
, it works:
alvas@ubi:~/test-out-of-box/training-tools$ ls
d4norm hmmnorm merge_alignment.py mgiza mkcls plain2snt snt2cooc snt2coocrmp snt2plain symal
alvas@ubi:~/test-out-of-box/training-tools$ chmod 777 *
alvas@ubi:~/test-out-of-box/training-tools$ cd ..
alvas@ubi:~/test-out-of-box$ ls
Europarl.de-en.de Europarl.de-en.en LexicalTranslationModel.pm training-tools train-model.perl
alvas@ubi:~/test-out-of-box$ perl train-model.perl --external-bin-dir training-tools/ --mgiza
Using SCRIPTS_ROOTDIR: /home/alvas/test-out-of-box
Using multi-thread GIZA
using gzip
ERROR: use --corpus to specify corpus at train-model.perl line 379.
But is there a safer way to change the permission? What sorts of permission does train-model.perl
need? Doing chmod 777
works but it's a little unsafe.
It's sort of digging into the closet but seems like train-model.perl
is behaving weirdly.
When I ran:
perl train-model.perl --root-dir . --model-dir model --corpus Europarl.de-en --f en --e de --external-bin-dir "training-tools" --mgiza --parallel --first-step 1 --last-step 3
mkcls
and mgiza
completes and when the script is trying to stitch the results, train-model.perl
starts to behave weirdly and looks for the moses/bin/symal
instead of $_EXTERNAL_BINDIR/symal
.
Using SCRIPTS_ROOTDIR: /home/alvas/test-out-of-box
Using multi-thread GIZA
using gzip
(1) preparing corpus @ Tue May 19 02:05:17 CEST 2015
Executing: mkdir -p /home/alvas/test-out-of-box/corpus
(1.0) selecting factors @ Tue May 19 02:05:17 CEST 2015
Forking...
(1.1) running mkcls @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/training-tools/mkcls -c50 -n2 -p/home/alvas/test-out-of-box/Europarl.de-en.en -V/home/alvas/test-out-of-box/corpus/en.vcb.classes opt
/home/alvas/test-out-of-box/corpus/en.vcb.classes already in place, reusing
(1.2) creating vcb file /home/alvas/test-out-of-box/corpus/en.vcb @ Tue May 19 02:05:17 CEST 2015
(1.1) running mkcls @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/training-tools/mkcls -c50 -n2 -p/home/alvas/test-out-of-box/Europarl.de-en.de -V/home/alvas/test-out-of-box/corpus/de.vcb.classes opt
/home/alvas/test-out-of-box/corpus/de.vcb.classes already in place, reusing
(1.2) creating vcb file /home/alvas/test-out-of-box/corpus/de.vcb @ Tue May 19 02:05:17 CEST 2015
(1.3) numberizing corpus /home/alvas/test-out-of-box/corpus/en-de-int-train.snt @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/corpus/en-de-int-train.snt already in place, reusing
(1.3) numberizing corpus /home/alvas/test-out-of-box/corpus/de-en-int-train.snt @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/corpus/de-en-int-train.snt already in place, reusing
Waiting for mkcls processes to finish...
(2) running giza @ Tue May 19 02:05:17 CEST 2015
(2.1a) running snt2cooc de-en @ Tue May 19 02:05:17 CEST 2015
Executing: mkdir -p /home/alvas/test-out-of-box/giza.de-en
/home/alvas/test-out-of-box/training-tools/snt2cooc /home/alvas/test-out-of-box/giza.de-en/de-en.cooc /home/alvas/test-out-of-box/corpus/en.vcb /home/alvas/test-out-of-box/corpus/de.vcb /home/alvas/test-out-of-box/corpus/de-en-int-train.snt
Executing: /home/alvas/test-out-of-box/training-tools/snt2cooc /home/alvas/test-out-of-box/giza.de-en/de-en.cooc /home/alvas/test-out-of-box/corpus/en.vcb /home/alvas/test-out-of-box/corpus/de.vcb /home/alvas/test-out-of-box/corpus/de-en-int-train.snt
(2.1a) running snt2cooc en-de @ Tue May 19 02:05:17 CEST 2015
Executing: mkdir -p /home/alvas/test-out-of-box/giza.en-de
/home/alvas/test-out-of-box/training-tools/snt2cooc /home/alvas/test-out-of-box/giza.en-de/en-de.cooc /home/alvas/test-out-of-box/corpus/de.vcb /home/alvas/test-out-of-box/corpus/en.vcb /home/alvas/test-out-of-box/corpus/en-de-int-train.snt
Executing: /home/alvas/test-out-of-box/training-tools/snt2cooc /home/alvas/test-out-of-box/giza.en-de/en-de.cooc /home/alvas/test-out-of-box/corpus/de.vcb /home/alvas/test-out-of-box/corpus/en.vcb /home/alvas/test-out-of-box/corpus/en-de-int-train.snt
END.
END.
(2.1b) running giza de-en @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/training-tools/mgiza -CoocurrenceFile /home/alvas/test-out-of-box/giza.de-en/de-en.cooc -c /home/alvas/test-out-of-box/corpus/de-en-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 4 -nodumps 1 -nsmooth 4 -o /home/alvas/test-out-of-box/giza.de-en/de-en -onlyaldumps 1 -p0 0.999 -s /home/alvas/test-out-of-box/corpus/en.vcb -t /home/alvas/test-out-of-box/corpus/de.vcb
/home/alvas/test-out-of-box/giza.de-en/de-en.A3.final.gz seems finished, reusing.
Waiting for second GIZA process...
(2.1b) running giza en-de @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/training-tools/mgiza -CoocurrenceFile /home/alvas/test-out-of-box/giza.en-de/en-de.cooc -c /home/alvas/test-out-of-box/corpus/en-de-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 4 -nodumps 1 -nsmooth 4 -o /home/alvas/test-out-of-box/giza.en-de/en-de -onlyaldumps 1 -p0 0.999 -s /home/alvas/test-out-of-box/corpus/de.vcb -t /home/alvas/test-out-of-box/corpus/en.vcb
/home/alvas/test-out-of-box/giza.en-de/en-de.A3.final.gz seems finished, reusing.
(3) generate word alignment @ Tue May 19 02:05:17 CEST 2015
Combining forward and inverted alignment from files:
/home/alvas/test-out-of-box/giza.en-de/en-de.A3.final.{bz2,gz}
/home/alvas/test-out-of-box/giza.de-en/de-en.A3.final.{bz2,gz}
Executing: mkdir -p /home/alvas/test-out-of-box/model
Executing: /home/alvas/test-out-of-box/training/giza2bal.pl -d "gzip -cd /home/alvas/test-out-of-box/giza.de-en/de-en.A3.final.gz" -i "gzip -cd /home/alvas/test-out-of-box/giza.en-de/en-de.A3.final.gz" |/home/alvas/test-out-of-box/../bin/symal -alignment="grow" -diagonal="yes" -final="yes" -both="no" > /home/alvas/test-out-of-box/model/aligned.grow-diag-final
sh: 1: /home/alvas/test-out-of-box/training/giza2bal.pl: not found
sh: 1: /home/alvas/test-out-of-box/../bin/symal: not found
Exit code: 127
ERROR: Can't generate symmetrized alignment file
Also, $SCRIPTS_ROOTDIR
seems to be controlling where train-model.perl
finds the complimentary scripts. This is unavoidable, unless we allow $SCRIPTS_ROOTDIR
to be customize-able but it will lead to a whole lot of other problems.
Solution: Use Moses scripts as they are compiled and installed normally.
Enlightenment: Training scripts don't work out of the box.
For more info: https://github.com/alvations/usaarhat-repo/blob/master/Align-A-Line.md
Use the ‘x’ permission bit on anything that you want to be able to execute. Strictly speaking if it's in your home directory you probably only need that permission for the file's owner (you), but the usual and simple thing is to allow it for all users.
So, to permit execution of a file, do::
chmod a+x $MYFILE
@jtv, thanks for the chmod permission solution!!! But the problems that comes after the permission is a little harder to resolve because it's closely tied to the pseudo-static path that train-model.perl
tries to use.
Sorry to jump into a closed thread, but I'm having a similar issue and I'm not sure why this was closed. train-model.perl
is failing to find symal
because it's looking for "$SCRIPTS_ROOTDIR/../bin/symal"
and not "$_EXTERNAL_BINDIR/symal"
or even the symal
in the Moses bin dir (which for me is not a sibling of $SCRIPTS_ROOTDIR
). Here's the offending line: https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/train-model.perl#L466
I'm using EMS, and here are the relevant paths from my config file:
moses-src-dir = /NLP_TOOLS/mt_tools/moses/v3.0-release
moses-bin-dir = $moses-src-dir/bin
moses-script-dir = $moses-src-dir/src/scripts
external-bin-dir = /NLP_TOOLS/mt_tools/mgizapp/latest/bin
Note that the while the bin-dir is under $moses-src-dir/bin
, the script-dir is another level lower ($moses-src-dir/src/scripts
). This install is on my university's cluster and I don't have permissions to move things around.
Why does train-model.perl assume the bin-dir is a sibling to the script-dir when it has both the $moses-bin-dir and $external-bin-dir variables available?
@goodmami Last year, I ended up modifying the path in the train-model.perl
to suit my machine. I've changed all the path to the binaries and path to other specific perl scripts with static path.
The assumption for my $SYMAL = "$SCRIPTS_ROOTDIR/../bin/symal";
is because it assumes that Moses is installed as per the instructions from http://www.statmt.org/moses/?n=Development.GetStarted such that the moses is installed with path like this:
alvas@ubi:~$ cd mosesdecoder/
alvas@ubi:~/mosesdecoder$ ls
biconcor defer mert OnDiskPt scripts
bin doc mingw phrase-extract search
bjam jam-files mira previous.sh symal
BUILD-INSTRUCTIONS.txt Jamroot misc regression-testing util
contrib lib moses sample-models vw
cruise-control lm moses-cmd sample-models.tgz
alvas@ubi:~/mosesdecoder$ cd scripts/
alvas@ubi:~/mosesdecoder/scripts$ ls
analysis generic other regression-testing tests Transliteration
ems Jamfile README server tokenizer
fuzzy-match OSM recaser share training
alvas@ubi:~/mosesdecoder/scripts$ cd ../bin
alvas@ubi:~/mosesdecoder/bin$ ls
1-1-Extraction filter processLexicalTable
biconcor fragment processPhraseTable
build_binary generateSequences project-cache.jam
config.log kbmira prunePhraseTable
consolidate lexical-reordering-score query
consolidate-direct lmbrgrid queryLexicalTable
consolidate-reverse lmplz queryOnDiskPt
CreateOnDiskPt merge-sorted queryPhraseTable
dump_counts mert relax-parse
evaluator mira score
extract moses sentence-bleu
extract-ghkm moses_chart statistics
extract-lex pcfg-extract symal
extract-mixed-syntax pcfg-score TMining
extractor phrase-lookup
extract-rules pro
The TL;D12R
way would be something like:
cd /path/to/
wget http://www.statmt.org/moses/RELEASE-3.0/binaries/linux-64bit/linux-64bit.tgz
tar zxvf linux-64bit.tgz
mv linux-64bit mosesdecoder
chmod a+x -R mosesdecoder
Since the scripts and EMS should not use the source directly, In the config file, you can do this:
moses-src-dir = /path/to/mosesdecoder
moses-bin-dir = $moses-src-dir/bin
moses-script-dir = $moses-src-dir/scripts
external-bin-dir = $moses-src-dir/training-tools
Thanks @alvations. I didn't install it myself, and our sysadmin claims to have followed the normal install. According to the link you provided (http://www.statmt.org/moses/?n=Development.GetStarted) (emphasis added):
--install-scripts=/path/to/scripts
copies scripts into a directory. Does not install if missing. No argument defaults toPREFIX/scripts
.
Since the directory didn't exist as a sibling to the bindir, I'm guessing he didn't provide the --install-scripts
option, which in the installation instructions is under "Popular additional bjam options" and not the "easy setup" heading. Even if the option is used, it's possible to provide a path that isn't the default, in which case the train-model.perl
script would still fail because of the directory location assumption.
Anyway, we fixed that problem by symlinking the scripts directory at the expected location, but my original question still stands (emphasis added):
Why does train-model.perl assume the bin-dir is a sibling to the script-dir when it has both the $moses-bin-dir and $external-bin-dir variables available?
I think this hardcoding of the path assumption is a bug. I'd be happy to submit a PR, but I'm not sure what to fix. Maybe I'd just need to change whatever calls train-model.perl
to provide the appropriate command-line options, but maybe I'd also need to change train-model.perl
to actually use them?
Thanks!
When i tried the following on Ubuntu 14.10:
It throws the error:
When I tried the full path:
It throws the same error.
Any clues to why this happens?
For diagnostics, here's the directory structure: