microsoft / ProphetNet

A research project for natural language generation, containing the official implementations by MSRA NLC team.
MIT License
686 stars 109 forks source link

RuntimeError: unexpected EOF. Corrupted File? #13

Closed gouldju1 closed 4 years ago

gouldju1 commented 4 years ago

Hello,

I performed the following:

  1. Clone prophetnet repository
  2. Installed torch and fairseq
  3. Download ProphetNet-large-160GB pre-trained model
  4. Download CNN/DM data
  5. Preprocess CNN/DM data via preprocess_cnn_dm.py
  6. Use fairseq-preprocess to generate binaries

When I run fairseq-train or inference fairseq-generate, I get the following errors: Train

Traceback (most recent call last):
  File "/usr/local/bin/fairseq-train", line 11, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/train.py", line 333, in cli_main
    main(args)
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/train.py", line 51, in main
    model = task.build_model(args)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/tasks/fairseq_task.py", line 185, in build_model
    return models.build_model(args, self)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/models/__init__.py", line 48, in build_model
    return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
  File "/workspace/ProphetNet/src/prophetnet/ngram_s2s_model.py", line 147, in build_model
    states = torch.load(args.load_from_pretrained_model, map_location='cpu')
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 709, in _legacy_load
    deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 1092436 more bytes. The file might be corrupted.

Inference

Traceback (most recent call last):  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 151, in load_checkpoint_to_cpu    from fairseq.fb_pathmgr import fb_pathmgr
ModuleNotFoundError: No module named 'fairseq.fb_pathmgr'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/fairseq-generate", line 11, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/generate.py", line 199, in cli_main
    main(args)
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/generate.py", line 47, in main
    task=task,
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 179, in load_model_ensemble
    ensemble, args, _task = load_model_ensemble_and_task(filenames, arg_overrides, task)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 190, in load_model_ensemble_and_task
    state = load_checkpoint_to_cpu(filename, arg_overrides)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 160, in load_checkpoint_to_cpu
    path, map_location=lambda s, l: default_restore_location(s, "cpu")
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 709, in _legacy_load
    deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 5239485 more bytes. The file might be corrupted.

Inputs:

Train

fairseq-train \
--fp16 \
--user-dir ./prophetnet --task translation_prophetnet --arch ngram_transformer_prophet_large \
--optimizer adam --adam-betas '(0.9, 0.999)' --clip-norm 0.1 \
--lr 0.0001 \
--lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 1000 \
--dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
--criterion ngram_language_loss --label-smoothing 0.1 \
--update-freq 32  --max-sentences 2 \
--num-workers 4 \
--load-from-pretrained-model ../prophetnet_large_pretrained_160G_14epoch_model.pt \
--load-sep \
--ddp-backend=no_c10d --max-epoch 10 \
--max-source-positions 512 --max-target-positions 512 \
--skip-invalid-size-inputs-valid-test \
--seed 1 \
--save-dir ./cnndm/finetune_cnndm_checkpoints \
--keep-last-epochs 10 \
--tensorboard-logdir ./cnndm/finetune_cnndm_tensorboard \
./cnndm/processed

Inference

fairseq-generate \
./cnndm/processed \
--path ../prophetnet_large_pretrained_16G_64epoch_model.pt \
--user-dir prophetnet \
--task translation_prophetnet \
--batch-size 32 \
--gen-subset test \
--beam 5 \
--num-workers 4 \
--min-len 45 \
--max-len-b 110 \
--no-repeat-ngram-size 3 --lenpen 1.2 2>&1 > ../logs.output

Any idea how to handle this? Thank you.

yuyan2do commented 4 years ago

Looks like the binary data is incomplete. Please check the size of your bin, idx files, reprocess the data could help resolve this issue.

gouldju1 commented 4 years ago

Yes, it looks like this resolves the issue. However, now, after entering an input sentence during fairseq-interaction, I get the following:

Traceback (most recent call last):
  File "/usr/local/bin/fairseq-interactive", line 11, in <module>
    load_entry_point('fairseq', 'console_scripts', 'fairseq-interactive')()
  File "/workspace/fairseq/fairseq_cli/interactive.py", line 213, in cli_main
    main(args)
  File "/workspace/fairseq/fairseq_cli/interactive.py", line 164, in main
    translations = task.inference_step(generator, models, sample)
  File "/workspace/fairseq/fairseq/tasks/fairseq_task.py", line 356, in inference_step
    return generator.generate(models, sample, prefix_tokens=prefix_tokens)
  File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
    return func(*args, **kwargs)
  File "/workspace/fairseq/fairseq/sequence_generator.py", line 161, in generate
    return self._generate(sample, **kwargs)
  File "/workspace/fairseq/fairseq/sequence_generator.py", line 261, in _generate
    tokens[:, : step + 1], encoder_outs, self.temperature
  File "/workspace/fairseq/fairseq/sequence_generator.py", line 726, in forward_decoder
    incremental_state=self.incremental_states[i],
  File "/workspace/ProphetNet/src/prophetnet/ngram_s2s_model.py", line 590, in forward
    x_list, extra = self.extract_features(prev_output_tokens, encoder_out, incremental_state, **unused)
  File "/workspace/ProphetNet/src/prophetnet/ngram_s2s_model.py", line 751, in extract_features
    real_positions=real_positions
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/workspace/ProphetNet/src/prophetnet/ngram_s2s_model.py", line 365, in forward
    real_positions=real_positions
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/workspace/ProphetNet/src/prophetnet/ngram_multihead_attention.py", line 244, in forward
    saved_state = self._get_input_buffer(incremental_state)
  File "/workspace/ProphetNet/src/prophetnet/ngram_multihead_attention.py", line 418, in _get_input_buffer
    'attn_state',
  File "/workspace/fairseq/fairseq/utils.py", line 91, in get_incremental_state
    return module.get_incremental_state(incremental_state, key)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 576, in __getattr__
    type(self).__name__, name))
AttributeError: 'NgramMultiheadAttention' object has no attribute 'get_incremental_state'
qiweizhen commented 4 years ago

@gouldju1 Hi, no attribution error is caused by Fairseq version. The master version of Fairseq keeps changing in its api, thus we build ProphetNet in v-0.9.0. Please pip install fairseq==v0.9.0, and try if it works

gouldju1 commented 4 years ago

Yes, that works. Thank you!