joshua-decoder / joshua

Joshua Statistical Machine Translation Toolkit
http://joshua-decoder.org/
121 stars 56 forks source link

Berkeleyaligner to print progress to terminal #233

Open lewismc opened 8 years ago

lewismc commented 8 years ago

Right now when testing Joshua from within the Homebrew recipe, a lot of time is being spent on word alignment(???), I think that it would be more helpful if we could see some activity whilst waiting e.g. a progress bar.

lmcgibbn@LMC-032857 /usr/local(joshua) $ brew test -vd joshua
/usr/local/Library/brew.rb (Formulary::FormulaLoader): loading /usr/local/Library/Formula/joshua.rb
Testing joshua
/usr/local/Library/Homebrew/test.rb (Formulary::FromPathLoader): loading /usr/local/Library/Formula/joshua.rb
==> Downloading https://github.com/joshua-decoder/indian-parallel-corpora/archive/1.0.tar.gz
Already downloaded: /Library/Caches/Homebrew/joshua--indian-parallel-corpora-1.0.tar.gz
==> Verifying joshua--indian-parallel-corpora-1.0.tar.gz checksum
tar xf /Library/Caches/Homebrew/joshua--indian-parallel-corpora-1.0.tar.gz
==> $JOSHUA/bin/pipeline.pl --source bn --target en     --no-prepare --aligner berkeley     --type hiero     --corpus /usr/local/Cellar/joshua/6.0.5/share/bn-en/tok/training.bn-en     --tune /usr/local/Cellar/joshua/6.0.5/share/bn-en/tok/dev.bn-en     --test /usr/local/Cellar/joshua/6.0.5/share/bn-en/tok/devtest.bn-en
[source-numlines] rebuilding...
  dep=/usr/local/Cellar/joshua/6.0.5/share/bn-en/tok/training.bn-en.bn [CHANGED]
  cmd=cat /usr/local/Cellar/joshua/6.0.5/share/bn-en/tok/training.bn-en.bn | wc -l
  took 0 seconds (0s)
[source-numlines] retrieved cached result =>    20788
[berkeley-aligner-chunk-0] rebuilding...
  dep=alignments/0/word-align.conf [CHANGED]
  dep=/private/tmp/joshua20151106-17227-1kriiqu/data/train/splits/corpus.bn.0 [CHANGED]
  dep=/private/tmp/joshua20151106-17227-1kriiqu/data/train/splits/corpus.en.0 [CHANGED]
  dep=alignments/0/training.align [NOT FOUND]
  cmd=java -d64 -Xmx10g -jar /usr/local/Cellar/joshua/6.0.5/libexec/lib/berkeleyaligner.jar ++alignments/0/word-align.conf

The final line above is the one taking a good while!

mjpost commented 8 years ago

You can see the output progress for the Berkeley aligner in alignments/NUM/log, where NUM is the alignment shard (default size 1,000,000 sentences).