twairball / fairseq-zh-en

NMT for chinese-english using fairseq
210 stars 49 forks source link

fairseq command not found #2

Closed buttercutter closed 6 years ago

buttercutter commented 6 years ago

I have faced numerous directory search issues besides "fairseq" command not found. I have already installed https://github.com/pytorch/fairseq

Could anyone advise ?

[phung@archlinux fairseq-zh-en]$ ls challenger.md data-bin 'merge blanks.ipynb' nltk_data README.md tmp wmt17_generate.sh wmt17_train.sh data 'Dataset misaligned.ipynb' mosesdecoder preprocess subword-nmt trainings wmt17_prepare.sh [phung@archlinux fairseq-zh-en]$ sh ./wmt17_prepare.sh Building prefix dict from the default dictionary ... DEBUG:jieba:Building prefix dict from the default dictionary ... Dumping model to file cache /tmp/jieba.cache DEBUG:jieba:Dumping model to file cache /tmp/jieba.cache Loading model cost 0.834 seconds. DEBUG:jieba:Loading model cost 0.834 seconds. Prefix dict has been built succesfully. DEBUG:jieba:Prefix dict has been built succesfully. INFO:prepare:tokenizing: tmp/wmt17_en_zh/training/news-commentary-v12.zh-en.en INFO:tokenizer: [0] nltk.word_tokenize: 1929 or 1989?

Traceback (most recent call last): File "./preprocess/wmt.py", line 58, in prepare.prepare_dataset(DATA_DIR, TMP_DIR, ds) File "/home/phung/Documents/Grive/Personal/Coursera/Machine_Learning/fairseq/fairseq-zh-en/preprocess/prepare.py", line 79, in prepare_dataset tokenized = tokenizer.tokenize_file(tmp_filepath) File "/home/phung/Documents/Grive/Personal/Coursera/Machine_Learning/fairseq/fairseq-zh-en/preprocess/tokenizer.py", line 60, in tokenize_file _tokenized = tokenize(line, is_sgm, is_zh, lower_case, delim) File "/home/phung/Documents/Grive/Personal/Coursera/Machine_Learning/fairseq/fairseq-zh-en/preprocess/tokenizer.py", line 40, in tokenize _tok = jieba.cut(_line.rstrip('\r\n')) if is_zh else nltk.word_tokenize(_line) File "/usr/lib/python3.6/site-packages/nltk/tokenize/init.py", line 128, in word_tokenize sentences = [text] if preserve_line else sent_tokenize(text, language) File "/usr/lib/python3.6/site-packages/nltk/tokenize/init.py", line 94, in sent_tokenize tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language)) File "/usr/lib/python3.6/site-packages/nltk/data.py", line 836, in load opened_resource = _open(resource_url) File "/usr/lib/python3.6/site-packages/nltk/data.py", line 954, in open return find(path, path + ['']).open() File "/usr/lib/python3.6/site-packages/nltk/data.py", line 675, in find raise LookupError(resource_not_found) LookupError:


Resource punkt not found. Please use the NLTK Downloader to obtain the resource:

import nltk nltk.download('punkt')

Searched in:

  • '/home/phung/nltk_data'
  • '/usr/share/nltk_data'
  • '/usr/local/share/nltk_data'
  • '/usr/lib/nltk_data'
  • '/usr/local/lib/nltk_data'
  • '/usr/nltk_data'
  • '/usr/share/nltk_data'
  • '/usr/lib/nltk_data'
  • ''

./wmt17_prepare.sh: line 12: ../mosesdecoder/scripts/training/clean-corpus-n.perl: No such file or directory ./wmt17_prepare.sh: line 13: ../mosesdecoder/scripts/training/clean-corpus-n.perl: No such file or directory ./wmt17_prepare.sh: line 14: ../mosesdecoder/scripts/training/clean-corpus-n.perl: No such file or directory Encoding subword with BPE using ops=32000 ./wmt17_prepare.sh: line 23: data/wmt17_en_zh/train.clean.en: No such file or directory ./wmt17_prepare.sh: line 24: data/wmt17_en_zh/train.clean.zh: No such file or directory Applying vocab to training ./wmt17_prepare.sh: line 27: data/wmt17_en_zh/train.clean.en: No such file or directory ./wmt17_prepare.sh: line 28: data/wmt17_en_zh/train.clean.zh: No such file or directory Generating vocab: vocab.32000.bpe.en ./wmt17_prepare.sh: line 32: ../subword-nmt/get_vocab.py: No such file or directory cat: data/wmt17_en_zh/train.32000.bpe.en: No such file or directory Generating vocab: vocab.32000.bpe.zh ./wmt17_prepare.sh: line 35: ../subword-nmt/get_vocab.py: No such file or directory cat: data/wmt17_en_zh/train.32000.bpe.zh: No such file or directory Applying vocab to valid ./wmt17_prepare.sh: line 39: data/wmt17_en_zh/valid.clean.en: No such file or directory ./wmt17_prepare.sh: line 40: data/wmt17_en_zh/valid.clean.zh: No such file or directory Applying vocab to test ./wmt17_prepare.sh: line 44: data/wmt17_en_zh/test.clean.en: No such file or directory ./wmt17_prepare.sh: line 45: data/wmt17_en_zh/test.clean.zh: No such file or directory Preprocessing datasets... ./wmt17_prepare.sh: line 52: fairseq: command not found [phung@archlinux fairseq-zh-en]$

twairball commented 6 years ago

sorry this isn't the pytorch version, but the lua torch version.