Error with pre-trained word embeddings

michellegiang commented 6 years ago

Hi,

When I run test with your pre-trained word embeddings: . ./run.sh "/home/michelle/mlc/mlconvgec2018/data/test/conll14st-test/conll14st-test.tok.src" "/home/michelle/mlc/test" 2 "/home/michelle/mlc/mlconvgec2018/models/mlconv_embed"

I have below error. Could you please let me know how to solve it ? And how to get the M2 score instead of BLEU score ?

(michelle) michelle@k:~/mlc/mlconvgec2018$ ./run.sh "/home/michelle/mlc/mlconvgec2018/data/test/conll14st-test/conll14st-test.tok.src" "/home/michelle/mlc/test" 2 "/home/michelle/mlc/mlconvgec2018/models/mlconv_embed" ++ source paths.sh +++++ dirname paths.sh ++++ cd . ++++ pwd +++ BASE_DIR=/home/michelle/mlc/mlconvgec2018 +++ DATA_DIR=/home/michelle/mlc/mlconvgec2018/data +++ MODEL_DIR=/home/michelle/mlc/mlconvgec2018/models +++ SCRIPTS_DIR=/home/michelle/mlc/mlconvgec2018/scripts +++ SOFTWARE_DIR=/home/michelle/mlc/mlconvgec2018/software ++ '[' 4 -ge 4 ']' ++ input_file=/home/michelle/mlc/mlconvgec2018/data/test/conll14st-test/conll14st-test.tok.src ++ output_dir=/home/michelle/mlc/test ++ device=2 ++ model_path=/home/michelle/mlc/mlconvgec2018/models/mlconv_embed ++ '[' 4 -eq 6 ']' ++ '[' -d /home/michelle/mlc/mlconvgec2018/models/mlconv_embed ']' +++ ls /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model2.pt /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model3.pt /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt +++ tr '\n' ' ' +++ sed 's| ([^$])| --path \1|g' ++ models='/home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model2.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model3.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt ' ++ echo /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model2.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model3.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model2.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model3.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt ++ FAIRSEQPY=/home/michelle/mlc/mlconvgec2018/software/fairseq-py ++ NBEST_RERANKER=/home/michelle/mlc/mlconvgec2018/software/nbest-reranker ++ beam=12 ++ nbest=12 ++ threads=12 ++ mkdir -p /home/michelle/mlc/test ++ /home/michelle/mlc/mlconvgec2018/scripts/apply_bpe.py -c /home/michelle/mlc/mlconvgec2018/models/bpe_model/train.bpe.model ++ CUDA_VISIBLE_DEVICES=2 ++ python3.6 /home/michelle/mlc/mlconvgec2018/software/fairseq-py/generate.py --no-progress-bar --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model2.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model3.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt --beam 12 --nbest 12 --interactive --workers 12 /home/michelle/mlc/mlconvgec2018/models/data_bin Traceback (most recent call last): File "/home/michelle/mlc/mlconvgec2018/software/fairseq-py/generate.py", line 167, in main() File "/home/michelle/mlc/mlconvgec2018/software/fairseq-py/generate.py", line 41, in main models, dataset = utils.load_ensemble_for_inference(args.path, args.data) File "/home/michelle/mlc/mlconvgec2018/software/fairseq-py/fairseq/utils.py", line 127, in load_ensemble_for_inference model = build_model(args, dataset) File "/home/michelle/mlc/mlconvgec2018/software/fairseq-py/fairseq/utils.py", line 31, in build_model return getattr(models, args.model).build_model(args, dataset) File "/home/michelle/mlc/mlconvgec2018/software/fairseq-py/fairseq/models/fconv.py", line 541, in build_model dictionary=dataset.src_dict File "/home/michelle/mlc/mlconvgec2018/software/fairseq-py/fairseq/models/fconv.py", line 100, in init self.embed_tokens = load_embeddings(embed_path, dictionary, self.embed_tokens) File "/home/michelle/mlc/mlconvgec2018/software/fairseq-py/fairseq/models/fconv.py", line 22, in load_embeddings with open(embed_path) as f_embed: FileNotFoundError: [Errno 2] No such file or directory: '/home.local/shamil/wiki/wiki.bpe.fasttext/model.vec'

michellegiang commented 6 years ago

Hi,

I also post another issue at below link when I train with my data.

https://github.com/facebookresearch/fairseq-py/issues/129

shamilcm commented 6 years ago

Hi, the current issue seems to be with our fork of fairseq-py, so you can close the issue you opened here: facebookresearch/fairseq-py#129 and re-post the issue here.

shamilcm commented 6 years ago

The issue was due to some hardcoded paths in arguments. It is now fixed here: https://github.com/shamilcm/fairseq-py/commit/ceb2f1200c9e5b8bf42a1033e7638d3e8586609a Can you retry with this?

Regarding the training issue, can you close it at facebookresearch/fairseq-py#129 and open a new issue here. I will take a look at it.

Also what data is it trained on? Is it a very small training data with fewer than 30000 words in the vocabulary?

michellegiang commented 6 years ago

hi, the training data is Lang-8 Learner Corpus of English v1.0 and NUCLE

michellegiang commented 6 years ago

Hi Shamil,

So I need to delete the software/fairseq-py, download the new one and reinstall it right ?

Regards, Viet Anh

shamilcm commented 6 years ago

If you installed fairseq-py using setup.py, pull the new changes and run it again. Otherwise, you just need to just pull the changes. The change is done in only one file: fairseq/utils.py

michellegiang commented 6 years ago

Thank Shamil. If I already installed fairseq-py, could I just copy your new utils to replace the old utils ?

The reason is that I used your version of fairseq-py with the new version of PyTorch and I had some trouble with setup.py build and I need to apply some manual patch. (The version of PyTorch in your original read me has some problems, thus I need to use the newest version of PyTorch)

https://github.com/facebookresearch/fairseq-py/issues/120

shamilcm commented 6 years ago

If you installed by python setup.py develop, just getting utils.py should work.

shamilcm commented 6 years ago

Closing this issue as it has been resolved.

nusnlp / mlconvgec2018

Error with pre-trained word embeddings #3