jadore801120 / attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".
MIT License
8.77k stars 1.97k forks source link

Missing $ on loop variable at preprocess.perl line 1. #89

Closed ankur6ue closed 4 years ago

ankur6ue commented 5 years ago

Thanks for making this available. Trying to run this code on windows and getting the following error:

Missing $ on loop variable at preprocess.perl line 1.

When I run for l in en de; do for f in data/multi30k/.$l; do if [[ "$f" != "test" ]]; then sed -i "$ d" $f; fi; done; done for l in en de; do for f in data/multi30k/.$l; do perl tokenizer.perl -a -no-escape -l $l -q < $f > $f.atok; done; done

Can someone just make the data files that are input to the training code available? That way I won't have to run the pre processing steps.

Thanks!

crazysal commented 5 years ago

https://drive.google.com/open?id=1SR0nkSfdXHxidGmPVnQvtlNI4kMt7HI2

ps. might still run into path, env etc faults on windows. advice to install linux.

zealseeker commented 5 years ago

You didn't read the README carefully. As is mentioned, you should first download the utilities: https://github.com/jadore801120/attention-is-all-you-need-pytorch#some-useful-tools

wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/tokenizer/tokenizer.perl
wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.de
wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.en
sed -i "s/$RealBin\/..\/share\/nonbreaking_prefixes//" tokenizer.perl
wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/generic/multi-bleu.perl
mrdrozdov commented 5 years ago

Even if you follow those instructions, found still need to set mydir to the code directory in the tokenizer.perl file.

jadore801120 commented 4 years ago

@ankur6ue I haven't tried on Windows, thanks for pointing out!