moses-smt / mosesdecoder

Moses, the machine translation system
http://www.statmt.org/moses
GNU Lesser General Public License v2.1
1.59k stars 778 forks source link

Add option "-b" (unbuffer output) to tokenizer scripts #205

Closed loic-vial closed 6 years ago

loic-vial commented 6 years ago

Hello,

I propose to add the optional parameter "-b" to some tokenizer scripts that are currently missing it, comparing to the others which have it (tokenizer.perl, detokenizer.perl, normalize-punctuation.perl, tokenizer_PTB.perl)

This parameter allows to unbuffer the output and it is necessary in the case where you want to chain script calls with pipes in interactive mode.

For instance if you have a bash script (say test.sh) with the following line inside : "./normalize-punctuation.perl | ./tokenizer.perl" And if you want to use it interactively (run ./test.sh), the output will be buffered and therefore not show up at every line, as it would be expected. You have to pass the option "-b" to every perl script to make it interactive: "./normalize-punctuation.perl -b | ./tokenizer.perl -b"

Without the option "-b" available, it makes it impossible to use the other scripts (deescape-special-chars.perl, lowercase.perl, etc.) in a shell script in interactive mode.

hieuhoang commented 6 years ago

thanks!