lilt / alignment-scripts

Scripts to preprocess training and test data and to run fast_align and giza
MIT License
109 stars 22 forks source link

Issues when preprocessing and clarity in instructions? #11

Closed dsli208 closed 2 years ago

dsli208 commented 2 years ago

Am trying to run the alignment-scripts and am experiencing slight issues when running preprocessing. I was confused as to what this instruction meant, if anyone could give an example it might be helpful:

Export install locations for dependencies: export {MOSES_DIR,FASTALIGN_DIR,MGIZA_DIR}=/foo/bar

When running the code I get these issues:

./preprocess/train.sh: line 16: roen.src: No such file or directory
+ ../scripts/lowercase.py
+ for ln_pair in '"roen"' '"enfr"' '"deen"'
+ for suffix in '"src"' '"tgt"'
./preprocess/train.sh: line 16: roen.tgt: No such file or directory
+ ../scripts/lowercase.py
+ for suffix in '"src"' '"tgt"'
+ ../scripts/lowercase.py
+ wait
+ ../scripts/lowercase.py
./preprocess/train.sh: line 16: deen.src: No such file or directory
./preprocess/train.sh: line 16: deen.tgt: No such file or directory
+ cd -
/home/dl1051/alignment-scripts
+ for ln_pair in '"roen"' '"enfr"' '"deen"'
+ for suffix in '"src"' '"tgt"'
+ cat train/roen.lc.src test/roen.lc.src
cat: train/roen.lc.src: No such file or directory

might this have anything to do with the above instruction?

thomasZen commented 2 years ago

Hi @dsli208,

you first have to download Moses, FastAlign and MGiza. Make sure to also compile FastAlign and MGiza (you can skip MGiza if you only want to run FastAlign). The links to the repositories are mentioned here: https://github.com/lilt/alignment-scripts#dependencies (click on the links and you'll find installation instructions).

After you did that, set the environment variables to point to the directories. On my installation I exported the variables as follows:

export MOSES_DIR=/home/thomas/mosesdecoder-RELEASE-4.0
export MGIZA_DIR=/home/thomas/GitRepos/mgiza
export FASTALIGN_DIR=/home/thomas/GitRepos/fast_align

If you've never used environment variables or the export command before, it might make sense to skim a tutorial (e.g. this one: https://linuxconfig.org/learning-linux-commands-export).

thomasZen commented 2 years ago

I'm closing this for now as I hope above comment helped. @dsli208 Feel free to reopen this issue anytime.