[s2s] distillation.py fails with apex

stas00 commented 3 years ago
Splitting off from https://github.com/huggingface/transformers/pull/8631,
finetune.py works with apex, but distillation.py doesn't (no idea whether it ever did):
$ python distillation.py   --teacher facebook/bart-large-xsum --data_dir xsum   --tokenizer_name facebook/bart-large-xsum   --student_decoder_layers 6 --student_encoder_layers 12   --freeze_encoder --freeze_embeds   --learning_rate=3e-4   --do_train   --do_predict   --fp16   --val_check_interval 0.1 --n_val 1 --eval_beams 1 --length_penalty=0.5   --max_target_length=60 --val_max_target_length=60 --test_max_target_length=100   --model_name_or_path IGNORED   --alpha_hid=3.   --train_batch_size=16 --eval_batch_size=16 --gradient_accumulation_steps=2   --sortish_sampler   --num_train_epochs=6   --warmup_steps 1   --output_dir distilbart_xsum_12_6     --amp_backend=apex     --n_train 1     --gpus 1
[...]
2020-11-18 12:25:48.713431: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
using module SummarizationDistiller
/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:45: UserWarning: Checkpoint directory /mnt/nvme1/code/huggingface/transformers-s2s-dict/examples/seq2seq/distilbart_xsum_12_6 exists and is not empty. With save_top_k=1, all files in this directory will be deleted when a checkpoint is saved!
  warnings.warn(*args, **kwargs)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [1]
Using APEX 16bit precision.
Selected optimization level O2:  FP16 training with FP32 batchnorm and FP32 master weights.

Defaults for this optimization level are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : dynamic
Warning:  multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback.  Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)
Validation sanity check: 0it [00:00, ?it/s]Traceback (most recent call last):
  File "distillation.py", line 308, in <module>
    distill_main(args)
  File "distillation.py", line 299, in distill_main
    return ft_main(args, model=model)
  File "/mnt/nvme1/code/huggingface/transformers-s2s-dict/examples/seq2seq/finetune.py", line 409, in main
    trainer: pl.Trainer = generic_train(
  File "/mnt/nvme1/code/huggingface/transformers-s2s-dict/examples/lightning_base.py", line 398, in generic_train
    trainer.fit(model)
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 444, in fit
    results = self.accelerator_backend.train()
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 63, in train
    results = self.train_or_test()
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test
    results = self.trainer.train()
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 466, in train
    self.run_sanity_check(self.get_model())
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 658, in run_sanity_check
    _, eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches)
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 578, in run_evaluation
    output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx)
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 171, in evaluation_step
    output = self.trainer.accelerator_backend.validation_step(args)
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 87, in validation_step
    output = self.__validation_step(args)
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 95, in __validation_step
    output = self.trainer.model.validation_step(*args)
  File "/mnt/nvme1/code/huggingface/transformers-s2s-dict/examples/seq2seq/finetune.py", line 182, in validation_step
    return self._generative_step(batch)
  File "/mnt/nvme1/code/huggingface/transformers-s2s-dict/examples/seq2seq/finetune.py", line 226, in _generative_step
    loss_tensors = self._step(batch)
  File "distillation.py", line 193, in _step
    teacher_outputs = self.teacher(
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/models/bart/modeling_bart.py", line 1022, in forward
    outputs = self.model(
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/models/bart/modeling_bart.py", line 905, in forward
    decoder_outputs = self.decoder(
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/models/bart/modeling_bart.py", line 593, in forward
    x, layer_self_attn, layer_past, layer_cross_attn = decoder_layer(
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/models/bart/modeling_bart.py", line 453, in forward
    x, cross_attn_weights = self.encoder_attn(
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/models/bart/modeling_bart.py", line 695, in forward
    k = self.k_proj(key)
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 91, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/stas/anaconda3/envs/py38-pt16/lib/python3.8/site-packages/torch/nn/functional.py", line 1676, in linear
    output = input.matmul(weight.t())
RuntimeError: expected scalar type Float but found Half
@patil-suraj, @patrickvonplaten
stale[bot] commented 3 years ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
github-actions[bot] commented 3 years ago
This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.
If you think this still needs to be addressed please comment on this thread.
huggingface / transformers

[s2s] distillation.py fails with apex #8632