Hi, I'm trying to set up a multilingual LM pretraining and supervised NMT. I'm having a problem with following the sample code for pretraining and fine tuning. I'm planning to build a single NMT model for EN-TL and EN-CEB pairs.
I have the following data following the MASS-supNMT docs.
`
raise FileNotFoundError('Not Found available {}-{} para dataset for ({}) lang'.format(split, key, src)) FileNotFoundError: Not Found available valid-ceb-en para dataset for (ceb) lang
I tried to create a copy of datasets following valid-ceb-en format but the error is still occurring.
I hope someone can help me on setting up an experiment for a multilingual setting.
Hi, I'm trying to set up a multilingual LM pretraining and supervised NMT. I'm having a problem with following the sample code for pretraining and fine tuning. I'm planning to build a single NMT model for EN-TL and EN-CEB pairs.
I have the following data following the MASS-supNMT docs. `
and I have the following code for pretraining
fairseq-train $data_dir --user-dir $user_dir --save-dir $save_dir --task xmasked_seq2seq --source-langs ceb,en,tl --target-langs ceb,en,tl --langs ceb,en,tl --arch xtransformer --mass_steps ceb-ceb,en-en,tl-tl --memt_steps en-ceb, en-tl --optimizer adam --adam-betas '(0.9,0.98)' --clip-norm 0.0 --lr-scheduler inverse_sqrt --lr 0.00005 --min-lr 1e-09 --criterion label_smoothed_cross_entropy --max-tokens 4096 --max-update 100000 --max-epoch 10 \ --dropout 0.1 --relu-dropout 0.1 --attention-dropout 0.1 --share-decoder-input-output-embed \ --valid-lang-pairs en-ceb, en-tl --word_mask 0.3 \ --ddp-backend=no_c10d
However, I keep on getting this error:
raise FileNotFoundError('Not Found available {}-{} para dataset for ({}) lang'.format(split, key, src)) FileNotFoundError: Not Found available valid-ceb-en para dataset for (ceb) lang
I tried to create a copy of datasets following valid-ceb-en format but the error is still occurring.I hope someone can help me on setting up an experiment for a multilingual setting.