microsoft / MASS

MASS: Masked Sequence to Sequence Pre-training for Language Generation
https://arxiv.org/pdf/1905.02450.pdf
Other
1.11k stars 206 forks source link

How to create dictionary dict.lg.txt in MASS supNMT #172

Open Ashmari opened 3 years ago

Ashmari commented 3 years ago

I tried MASS unsupNMT and I then tried with supNMT but I'm getting this. And I am not clear about creating the dict.lg.txt Do we need to create data directory manually as given in instructions?

I am getting this error after running generate_enzh_data.sh

Namespace(alignfile=None, cpu=False, criterion='cross_entropy', dataset_impl='cached', destdir='data//processed/', fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=False, log_format=None, log_interval=1000, lr_scheduler='fixed', memory_efficient_fp16=False, min_loss_scale=0.0001, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, only_source=True, optimizer='nag', padding_factor=8, seed=1, source_lang='en', srcdict='data//mono//dict.en.txt', target_lang=None, task='cross_lingual_lm', tbmf_wrapper=False, tensorboard_logdir='', testpref=None, tgtdict=None, threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, trainpref='data//mono//train', user_dir=None, validpref='data//mono//valid', workers=20) Traceback (most recent call last): File "/home/ashmari/anaconda3/envs/MassN/bin/fairseq-preprocess", line 8, in sys.exit(cli_main()) File "/home/ashmari/anaconda3/envs/MassN/lib/python3.7/site-packages/fairseq_cli/preprocess.py", line 267, in cli_main main(args) File "/home/ashmari/anaconda3/envs/MassN/lib/python3.7/site-packages/fairseq_cli/preprocess.py", line 80, in main src_dict = task.load_dictionary(args.srcdict) File "/home/ashmari/anaconda3/envs/MassN/lib/python3.7/site-packages/fairseq/tasks/cross_lingual_lm.py", line 82, in load_dictionary return MaskedLMDictionary.load(filename) File "/home/ashmari/anaconda3/envs/MassN/lib/python3.7/site-packages/fairseq/data/dictionary.py", line 181, in load raise fnfe File "/home/ashmari/anaconda3/envs/MassN/lib/python3.7/site-packages/fairseq/data/dictionary.py", line 175, in load with open(f, 'r', encoding='utf-8') as fd: FileNotFoundError: [Errno 2] No such file or directory: 'data//mono//dict.en.txt' mv: cannot stat 'data//processed//train.en-None.en.bin': No such file or directory mv: cannot stat 'data//processed//train.en-None.en.idx': No such file or directory mv: cannot stat 'data//processed//valid.en-None.en.bin': No such file or directory mv: cannot stat 'data//processed//valid.en-None.en.idx': No such file or directory Namespace(alignfile=None, cpu=False, criterion='cross_entropy', dataset_impl='cached', destdir='data//processed/', fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=False, log_format=None, log_interval=1000, lr_scheduler='fixed', memory_efficient_fp16=False, min_loss_scale=0.0001, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, only_source=True, optimizer='nag', padding_factor=8, seed=1, source_lang='zh', srcdict='data//mono//dict.zh.txt', target_lang=None, task='cross_lingual_lm', tbmf_wrapper=False, tensorboard_logdir='', testpref=None, tgtdict=None, threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, trainpref='data//mono//train', user_dir=None, validpref='data//mono//valid', workers=20) Traceback (most recent call last): File "/home/ashmari/anaconda3/envs/MassN/bin/fairseq-preprocess", line 8, in sys.exit(cli_main()) File "/home/ashmari/anaconda3/envs/MassN/lib/python3.7/site-packages/fairseq_cli/preprocess.py", line 267, in cli_main main(args) File "/home/ashmari/anaconda3/envs/MassN/lib/python3.7/site-packages/fairseq_cli/preprocess.py", line 80, in main src_dict = task.load_dictionary(args.srcdict) File "/home/ashmari/anaconda3/envs/MassN/lib/python3.7/site-packages/fairseq/tasks/cross_lingual_lm.py", line 82, in load_dictionary return MaskedLMDictionary.load(filename) File "/home/ashmari/anaconda3/envs/MassN/lib/python3.7/site-packages/fairseq/data/dictionary.py", line 181, in load raise fnfe File "/home/ashmari/anaconda3/envs/MassN/lib/python3.7/site-packages/fairseq/data/dictionary.py", line 175, in load with open(f, 'r', encoding='utf-8') as fd: FileNotFoundError: [Errno 2] No such file or directory: 'data//mono//dict.zh.txt'

What is going wrong? Please help. Thank you