Closed ankur6ue closed 4 years ago
https://drive.google.com/open?id=1SR0nkSfdXHxidGmPVnQvtlNI4kMt7HI2
ps. might still run into path, env etc faults on windows. advice to install linux.
You didn't read the README carefully. As is mentioned, you should first download the utilities: https://github.com/jadore801120/attention-is-all-you-need-pytorch#some-useful-tools
wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/tokenizer/tokenizer.perl
wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.de
wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.en
sed -i "s/$RealBin\/..\/share\/nonbreaking_prefixes//" tokenizer.perl
wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/generic/multi-bleu.perl
Even if you follow those instructions, found still need to set mydir
to the code directory in the tokenizer.perl
file.
@ankur6ue I haven't tried on Windows, thanks for pointing out!
Thanks for making this available. Trying to run this code on windows and getting the following error:
Missing $ on loop variable at preprocess.perl line 1.
When I run for l in en de; do for f in data/multi30k/.$l; do if [[ "$f" != "test" ]]; then sed -i "$ d" $f; fi; done; done for l in en de; do for f in data/multi30k/.$l; do perl tokenizer.perl -a -no-escape -l $l -q < $f > $f.atok; done; done
Can someone just make the data files that are input to the training code available? That way I won't have to run the pre processing steps.
Thanks!