Closed raypretam closed 2 years ago
Hi,
The problem is that when official models are used, you need to pass the language indicator tokens directly to the script.
So before you passed --langs hi,kn,bn now you need to pass --langs "<2hi>,<2kn>,<2bn>"
Try it and let me know.
The above solution worked only for IndicBART it seems as it is not working with mbart-large-50
python pretrain_nmt.py -n 1 -nr 0 -g 1 --use_official_pretrained --pretrained_model facebook/mbart-large-50 --tokenizer_name_or_path facebook/mbart-large-50 --langs hi,kn,bn --mono_src /home/aniruddha/all_data/train.hi,/home/aniruddha/all_data/train.kn,/home/aniruddha/all_data/train.bn --batch_size 8 --batch_size_indicates_lines --shard_files --model_path Facebook/Mbart-large-50/model --port 8080
python pretrain_nmt.py -n 1 -nr 0 -g 1 --use_official_pretrained --pretrained_model facebook/mbart-large-50 --tokenizer_name_or_path facebook/mbart-large-50 --langs "<2hi>,<2kn>,<2bn>" --mono_src /home/aniruddha/all_data/train.hi,/home/aniruddha/all_data/train.kn,/home/aniruddha/all_data/train.bn --batch_size 8 --batch_size_indicates_lines --shard_files --model_path Facebook/Mbart-large-50/model --port 8080
Both are giving error:
/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:247: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
warnings.warn("To get the last learning rate computed by the scheduler, "
/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Initial LR is: 1.25e-07
Training from official pretrained model
/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:216: UserWarning: Please also save or load the state of the optimizer when saving or loading the scheduler.
warnings.warn(SAVE_STATE_WARNING, UserWarning)
/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:234: UserWarning: Please also save or load the state of the optimizer when saving or loading the scheduler.
warnings.warn(SAVE_STATE_WARNING, UserWarning)
Using label smoothing of 0.1
Using gradient clipping norm of 1.0
Using softmax temperature of 1.0
Masking ratio: 0.3
Training for: ['hi_IN', 'kn_IN', 'bn_IN']
Shuffling corpus!
Shuffling corpus!
Shuffling corpus!
Saving the model
Loading from checkpoint
Traceback (most recent call last):
File "pretrain_nmt.py", line 970, in <module>
run_demo()
File "pretrain_nmt.py", line 967, in run_demo
mp.spawn(model_create_load_run_save, nprocs=args.gpus, args=(args,files,train_files,)) #
File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/aniruddha/yanmtt/pretrain_nmt.py", line 521, in model_create_load_run_save
lprobs, labels, args.label_smoothing, ignore_index=tok.pad_token_id
File "/home/aniruddha/yanmtt/common_utils.py", line 147, in label_smoothed_nll_loss
smooth_loss.masked_fill_(pad_mask, 0.0)
RuntimeError: The expanded size of the tensor (96) must match the existing size (94) at non-singleton dimension 1. Target sizes: [8, 96, 1]. Tensor sizes: [8, 94, 1]```
For mbart-50 you need to use the language indicator tokens like hi_IN,kn_IN,bn_IN
We have already tried the following command:
python pretrain_nmt.py -n 1 -nr 0 -g 1 --use_official_pretrained --pretrained_model facebook/mbart-large-50 --tokenizer_name_or_path facebook/mbart-large-50 --langs hi_IN,kn_IN,bn_IN --mono_src /home/aniruddha/yanmtt/mbart_data/train.hi_IN,/home/aniruddha/yanmtt/mbart_data/train.kn_IN,/home/aniruddha/yanmtt/mbart_data/train.bn_IN --batch_size 8 --batch_size_indicates_lines --shard_files --model_path Facebook/Mbart-large-50/model --port 8080
The same kind of error is coming.
Oh yeah thats because Kannada is not supported by mbart-50. kn_IN does not exist. You may have to give up on Kannada or use the token for another language like ta_IN etc to work with Kannada.
Yeah it's working now. I will get back if any issue occurs. Thanks 🙇♂️
The command we are using: python pretrain_nmt.py -n 1 -nr 0 -g 1 --use_official_pretrained --pretrained_model ai4bharat/IndicBART --tokenizer_name_or_path ai4bharat/IndicBART --langs kn --mono_src /home/aniruddha/all_data/train.kn --batch_size 8 --batch_size_indicates_lines --shard_files --model_path aibharat/IndicBART/model --port 7878
Traceback (most recent call last): File "pretrain_nmt.py", line 970, in
run_demo()
File "pretrain_nmt.py", line 967, in run_demo
mp.spawn(model_create_load_run_save, nprocs=args.gpus, args=(args,files,train_files,)) #
File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/home/aniruddha/yanmtt/pretrain_nmt.py", line 521, in model_create_load_run_save lprobs, labels, args.label_smoothing, ignore_index=tok.pad_token_id File "/home/aniruddha/yanmtt/common_utils.py", line 147, in label_smoothed_nll_loss smooth_loss.maskedfill(pad_mask, 0.0) RuntimeError: The expanded size of the tensor (333) must match the existing size (332) at non-singleton dimension 1. Target sizes: [8, 333, 1]. Tensor sizes: [8, 332, 1]
But when the data file has ".hi " language extension the code works fine.