microsoft / MASS

MASS: Masked Sequence to Sequence Pre-training for Language Generation
https://arxiv.org/pdf/1905.02450.pdf
Other
1.11k stars 206 forks source link

Problem while running Supervised NMT #162

Open RachitBansal opened 3 years ago

RachitBansal commented 3 years ago

I am trying to pre-train the supervised version of MASS NMT on my data, but am getting the following traceback:

  File "/data/orin-cdu/rbansal/newEnv/lib/python3.6/site-packages/fairseq_cli/train.py", line 80, in main
    train(args, trainer, task, epoch_itr)
  File "/data/orin-cdu/rbansal/newEnv/lib/python3.6/site-packages/fairseq_cli/train.py", line 121, in train
    log_output = trainer.train_step(samples)
  File "/data/orin-cdu/rbansal/newEnv/lib/python3.6/site-packages/fairseq/trainer.py", line 289, in train_step
    raise e
  File "/data/orin-cdu/rbansal/newEnv/lib/python3.6/site-packages/fairseq/trainer.py", line 266, in train_step
    ignore_grad
  File "/data/orin-cdu/rbansal/Unsupervised-NMT-for-Sumerian-English/translation/MASS-snmt/mass/xmasked_seq2seq.py", line 402, in train_step
    forward_backward(model, sample[sample_key], sample_key, lang_pair)
  File "/data/orin-cdu/rbansal/Unsupervised-NMT-for-Sumerian-English/translation/MASS-snmt/mass/xmasked_seq2seq.py", line 383, in forward_backward
    loss, sample_size, logging_output = criterion(model, samples)
  File "/data/orin-cdu/rbansal/newEnv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/orin-cdu/rbansal/newEnv/lib/python3.6/site-packages/fairseq/criterions/label_smoothed_cross_entropy.py", line 38, in forward
    net_output = model(**sample['net_input'])
  File "/data/orin-cdu/rbansal/newEnv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/orin-cdu/rbansal/Unsupervised-NMT-for-Sumerian-English/translation/MASS-snmt/mass/xtransformer.py", line 154, in forward
    encoder_out = self.encoders[src_key](src_tokens, src_lengths)
  File "/data/orin-cdu/rbansal/newEnv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/orin-cdu/rbansal/Unsupervised-NMT-for-Sumerian-English/translation/MASS-snmt/mass/xtransformer.py", line 42, in forward
    if not encoder_padding_mask.any():
RuntimeError: CUDA error: device-side assert triggered

I have tried using all versions of fairseq including 0.7.1 but they give the same error.

I even tried printing the encoder_padding_mask, it was something like this:

tensor([[False,  True,  True,  ..., False, False, False],
        [False, False, False,  ...,  True, False, False],
        [False,  True,  True,  ..., False,  True, False],
        ...,
    [False,  True, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        [False, False,  True,  ..., False, False, False]], device='cuda:0')
tensor([[False, False, False,  ...,  True, False, False],
        [False, False, False,  ...,  True, False, False],
        [False, False, False,  ..., False, False, False],
        ...,
    [False, False, False,  ..., False, False, False],
        [False,  True,  True,  ..., False, False, False],
        [False,  True,  True,  ...,  True, False, False]], device='cuda:0')
tensor([[False,  True, False,  ..., False, False, False],
        [False, False, False,  ...,  True, False, False],
        [False, False, False,  ...,  True,  True,  True],
        ...,
    [False, False, False,  ..., False, False, False],
        [False, False, False,  ...,  True, False,  True],
        [False, False, False,  ..., False, False, False]], device='cuda:0')

What is going wrong? Please help.

StillKeepTry commented 3 years ago

Are there any empty lines in your data? It seems like a array index out of bound.

RachitBansal commented 3 years ago

Hey @StillKeepTry, I checked, there are no empty lines in the data.