microsoft / MASS

MASS: Masked Sequence to Sequence Pre-training for Language Generation
https://arxiv.org/pdf/1905.02450.pdf
Other
1.12k stars 206 forks source link

Unable to load Zh-En Pre-trained Model for fine-tuning #153

Open riddlehk opened 4 years ago

riddlehk commented 4 years ago

Dear authors,

I make use of the script to perform fine-tuning on the zh-en pre-trained model you provided. After allocating GPUs, dictionaries and binary data, the following error messages popped up:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/opt/conda/lib/python3.6/site-packages/fairseq_cli/train.py", line 265, in distributed_main
    main(args, init_distributed=True)
  File "/opt/conda/lib/python3.6/site-packages/fairseq_cli/train.py", line 68, in main
    extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
  File "/opt/conda/lib/python3.6/site-packages/fairseq/checkpoint_utils.py", line 107, in load_checkpoint
    reset_meters=args.reset_meters,
  File "/opt/conda/lib/python3.6/site-packages/fairseq/trainer.py", line 154, in load_checkpoint
    'Cannot load model parameters from checkpoint, '
Exception: Cannot load model parameters from checkpoint, please ensure that the architectures match.

I observed that the size of the pre-trained model is much bigger (of size 6425203122) than the checkpoints I obtained by training from scratch (of size ~3220000000), any advice on letting the pre-trained model being successfully loaded?

ranggarppb commented 4 years ago

I've got this problem too. Is there any solution for this?