prajdabre / yanmtt

Yet Another Neural Machine Translation Toolkit
MIT License
174 stars 32 forks source link

Error when when trying to pretrain with other language extensions apart from hi #38

Closed raypretam closed 2 years ago

raypretam commented 2 years ago

The command we are using: python pretrain_nmt.py -n 1 -nr 0 -g 1 --use_official_pretrained --pretrained_model ai4bharat/IndicBART --tokenizer_name_or_path ai4bharat/IndicBART --langs kn --mono_src /home/aniruddha/all_data/train.kn --batch_size 8 --batch_size_indicates_lines --shard_files --model_path aibharat/IndicBART/model --port 7878


Traceback (most recent call last): File "pretrain_nmt.py", line 970, in run_demo() File "pretrain_nmt.py", line 967, in run_demo mp.spawn(model_create_load_run_save, nprocs=args.gpus, args=(args,files,train_files,)) # File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes while not context.join(): File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/home/aniruddha/yanmtt/pretrain_nmt.py", line 521, in model_create_load_run_save lprobs, labels, args.label_smoothing, ignore_index=tok.pad_token_id File "/home/aniruddha/yanmtt/common_utils.py", line 147, in label_smoothed_nll_loss smooth_loss.maskedfill(pad_mask, 0.0) RuntimeError: The expanded size of the tensor (333) must match the existing size (332) at non-singleton dimension 1. Target sizes: [8, 333, 1]. Tensor sizes: [8, 332, 1]


But when the data file has ".hi " language extension the code works fine.

prajdabre commented 2 years ago

Hi,

The problem is that when official models are used, you need to pass the language indicator tokens directly to the script.

So before you passed --langs hi,kn,bn now you need to pass --langs "<2hi>,<2kn>,<2bn>"

Try it and let me know.

raypretam commented 2 years ago

The above solution worked only for IndicBART it seems as it is not working with mbart-large-50

python pretrain_nmt.py -n 1 -nr 0 -g 1 --use_official_pretrained --pretrained_model facebook/mbart-large-50 --tokenizer_name_or_path facebook/mbart-large-50 --langs hi,kn,bn --mono_src /home/aniruddha/all_data/train.hi,/home/aniruddha/all_data/train.kn,/home/aniruddha/all_data/train.bn --batch_size 8 --batch_size_indicates_lines --shard_files --model_path Facebook/Mbart-large-50/model --port 8080

python pretrain_nmt.py -n 1 -nr 0 -g 1 --use_official_pretrained --pretrained_model facebook/mbart-large-50 --tokenizer_name_or_path facebook/mbart-large-50 --langs "<2hi>,<2kn>,<2bn>" --mono_src /home/aniruddha/all_data/train.hi,/home/aniruddha/all_data/train.kn,/home/aniruddha/all_data/train.bn --batch_size 8 --batch_size_indicates_lines --shard_files --model_path Facebook/Mbart-large-50/model --port 8080


Both are giving error:


/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:247: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
  warnings.warn("To get the last learning rate computed by the scheduler, "
/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Initial LR is: 1.25e-07
Training from official pretrained model
/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:216: UserWarning: Please also save or load the state of the optimizer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)
/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:234: UserWarning: Please also save or load the state of the optimizer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)
Using label smoothing of 0.1
Using gradient clipping norm of 1.0
Using softmax temperature of 1.0
Masking ratio: 0.3
Training for: ['hi_IN', 'kn_IN', 'bn_IN']
Shuffling corpus!
Shuffling corpus!
Shuffling corpus!
Saving the model
Loading from checkpoint
Traceback (most recent call last):
  File "pretrain_nmt.py", line 970, in <module>
    run_demo()
  File "pretrain_nmt.py", line 967, in run_demo
    mp.spawn(model_create_load_run_save, nprocs=args.gpus, args=(args,files,train_files,))         #
  File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
  File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/home/aniruddha/yanmtt/pretrain_nmt.py", line 521, in model_create_load_run_save
    lprobs, labels, args.label_smoothing, ignore_index=tok.pad_token_id
  File "/home/aniruddha/yanmtt/common_utils.py", line 147, in label_smoothed_nll_loss
    smooth_loss.masked_fill_(pad_mask, 0.0)
RuntimeError: The expanded size of the tensor (96) must match the existing size (94) at non-singleton dimension 1.  Target sizes: [8, 96, 1].  Tensor sizes: [8, 94, 1]```
prajdabre commented 2 years ago

For mbart-50 you need to use the language indicator tokens like hi_IN,kn_IN,bn_IN

raypretam commented 2 years ago

We have already tried the following command: python pretrain_nmt.py -n 1 -nr 0 -g 1 --use_official_pretrained --pretrained_model facebook/mbart-large-50 --tokenizer_name_or_path facebook/mbart-large-50 --langs hi_IN,kn_IN,bn_IN --mono_src /home/aniruddha/yanmtt/mbart_data/train.hi_IN,/home/aniruddha/yanmtt/mbart_data/train.kn_IN,/home/aniruddha/yanmtt/mbart_data/train.bn_IN --batch_size 8 --batch_size_indicates_lines --shard_files --model_path Facebook/Mbart-large-50/model --port 8080

The same kind of error is coming.

prajdabre commented 2 years ago

Oh yeah thats because Kannada is not supported by mbart-50. kn_IN does not exist. You may have to give up on Kannada or use the token for another language like ta_IN etc to work with Kannada.

raypretam commented 2 years ago

Yeah it's working now. I will get back if any issue occurs. Thanks 🙇‍♂️