transformers seems to have recently been "bricked"

Environment info

transformers version: 4.12.0.dev0
Platform: Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.7.12
PyTorch version (GPU?): 1.9.0+cu102 (False)
Tensorflow version (GPU?): 2.6.0 (False)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: No
Using distributed or parallel set-up in script?: Yes

Who can help

@sgugger

Information

The example script below was working fine until today. I believe that it was working in version 4.11.0.dev0. If you can please tell me how to checkout the source for 4.11.0.dev0 from github, I will confirm that it works.

To reproduce

Steps to reproduce the behavior:

On a TPU colab instance with High-RAM, run:

CHECKPOINT=bert-large-uncased
DATASET=rte
EPOCHS=2
BATCH_SIZE=16
LEARNING_RATE=3e-5

python transformers/examples/pytorch/xla_spawn.py --num_cores 8 \
  transformers/examples/pytorch/text-classification/run_glue.py \
  --model_name_or_path $CHECKPOINT \
  --task_name $DATASET \
  --seed 10000 \
  --output_dir results \
  --overwrite_output_dir \
  --num_train_epochs $EPOCHS \
  --evaluation_strategy no \
  --logging_strategy epoch \
  --save_strategy epoch \
  --per_device_train_batch_size $BATCH_SIZE \
  --per_device_eval_batch_size $BATCH_SIZE \
  --learning_rate $LEARNING_RATE \
  --do_train

Gives the error:

Exception in device=TPU:7: zero-dimensional tensor (at position 0) cannot be concatenated
Exception in device=TPU:4: zero-dimensional tensor (at position 0) cannot be concatenated
Exception in device=TPU:2: zero-dimensional tensor (at position 0) cannot be concatenated
Exception in device=TPU:1: zero-dimensional tensor (at position 0) cannot be concatenated
Exception in device=TPU:6: zero-dimensional tensor (at position 0) cannot be concatenated
Exception in device=TPU:5: zero-dimensional tensor (at position 0) cannot be concatenated
Exception in device=TPU:3: zero-dimensional tensor (at position 0) cannot be concatenated
Exception in device=TPU:0: zero-dimensional tensor (at position 0) cannot be concatenated

  File "/content/transformers/examples/pytorch/text-classification/run_glue.py", line 486, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/content/transformers/examples/pytorch/text-classification/run_glue.py", line 486, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/content/transformers/src/transformers/trainer.py", line 1383, in train
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/content/transformers/src/transformers/trainer.py", line 1383, in train
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/content/transformers/src/transformers/trainer.py", line 1467, in _maybe_log_save_evaluate
    tr_loss_scalar = self._nested_gather(tr_loss).mean().item()
  File "/content/transformers/src/transformers/trainer.py", line 1467, in _maybe_log_save_evaluate
    tr_loss_scalar = self._nested_gather(tr_loss).mean().item()
  File "/content/transformers/src/transformers/trainer.py", line 2373, in _nested_gather
    tensors = nested_xla_mesh_reduce(tensors, name)
  File "/content/transformers/src/transformers/trainer.py", line 2373, in _nested_gather
    tensors = nested_xla_mesh_reduce(tensors, name)
  File "/content/transformers/src/transformers/trainer_pt_utils.py", line 155, in nested_xla_mesh_reduce
    return xm.mesh_reduce(name, tensors, torch.cat)
  File "/content/transformers/src/transformers/trainer_pt_utils.py", line 155, in nested_xla_mesh_reduce
    return xm.mesh_reduce(name, tensors, torch.cat)
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/core/xla_model.py", line 916, in mesh_reduce
    return reduce_fn(xldata) if xldata else cpu_data
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/core/xla_model.py", line 916, in mesh_reduce
    return reduce_fn(xldata) if xldata else cpu_data

uted/xla_multiprocessing.py", line 329, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 323, in _start_fn
    fn(gindex, *args)
  File "/content/transformers/examples/pytorch/text-classification/run_glue.py", line 564, in _mp_fn
    main()
  File "/content/transformers/examples/pytorch/text-classification/run_glue.py", line 486, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/content/transformers/src/transformers/trainer.py", line 1383, in train
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/content/transformers/src/transformers/trainer.py", line 1467, in _maybe_log_save_evaluate
    tr_loss_scalar = self._nested_gather(tr_loss).mean().item()
  File "/content/transformers/src/transformers/trainer.py", line 2373, in _nested_gather
    tensors = nested_xla_mesh_reduce(tensors, name)
  File "/content/transformers/src/transformers/trainer_pt_utils.py", line 155, in nested_xla_mesh_reduce
    return xm.mesh_reduce(name, tensors, torch.cat)
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/core/xla_model.py", line 916, in mesh_reduce
    return reduce_fn(xldata) if xldata else cpu_data
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 329, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 323, in _start_fn
    fn(gindex, *args)
  File "/content/transformers/examples/pytorch/text-classification/run_glue.py", line 564, in _mp_fn
    main()
  File "/content/transformers/examples/pytorch/text-classification/run_glue.py", line 486, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/content/transformers/src/transformers/trainer.py", line 1383, in train
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/content/transformers/src/transformers/trainer.py", line 1467, in _maybe_log_save_evaluate
    tr_loss_scalar = self._nested_gather(tr_loss).mean().item()
  File "/content/transformers/src/transformers/trainer.py", line 2373, in _nested_gather
    tensors = nested_xla_mesh_reduce(tensors, name)
  File "/content/transformers/src/transformers/trainer_pt_utils.py", line 155, in nested_xla_mesh_reduce
    return xm.mesh_reduce(name, tensors, torch.cat)
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/core/xla_model.py", line 916, in mesh_reduce
    return reduce_fn(xldata) if xldata else cpu_data
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 329, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 323, in _start_fn
    fn(gindex, *args)
  File "/content/transformers/examples/pytorch/text-classification/run_glue.py", line 564, in _mp_fn
    main()
  File "/content/transformers/examples/pytorch/text-classification/run_glue.py", line 486, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/content/transformers/src/transformers/trainer.py", line 1383, in train
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/content/transformers/src/transformers/trainer.py", line 1467, in _maybe_log_save_evaluate
    tr_loss_scalar = self._nested_gather(tr_loss).mean().item()
Traceback (most recent call last):
  File "/content/transformers/src/transformers/trainer.py", line 2373, in _nested_gather
    tensors = nested_xla_mesh_reduce(tensors, name)
  File "/content/transformers/src/transformers/trainer_pt_utils.py", line 155, in nested_xla_mesh_reduce
    return xm.mesh_reduce(name, tensors, torch.cat)
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/core/xla_model.py", line 916, in mesh_reduce
    return reduce_fn(xldata) if xldata else cpu_data
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 329, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 323, in _start_fn
    fn(gindex, *args)
  File "/content/transformers/examples/pytorch/text-classification/run_glue.py", line 564, in _mp_fn
    main()
  File "/content/transformers/examples/pytorch/text-classification/run_glue.py", line 486, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/content/transformers/src/transformers/trainer.py", line 1383, in train
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/content/transformers/src/transformers/trainer.py", line 1467, in _maybe_log_save_evaluate
    tr_loss_scalar = self._nested_gather(tr_loss).mean().item()
  File "/content/transformers/src/transformers/trainer.py", line 2373, in _nested_gather
    tensors = nested_xla_mesh_reduce(tensors, name)
  File "/content/transformers/src/transformers/trainer_pt_utils.py", line 155, in nested_xla_mesh_reduce
    return xm.mesh_reduce(name, tensors, torch.cat)
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/core/xla_model.py", line 916, in mesh_reduce
    return reduce_fn(xldata) if xldata else cpu_data
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated

  File "/content/transformers/examples/pytorch/text-classification/run_glue.py", line 486, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/content/transformers/examples/pytorch/text-classification/run_glue.py", line 486, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/content/transformers/src/transformers/trainer.py", line 1383, in train
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/content/transformers/src/transformers/trainer.py", line 1383, in train
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/content/transformers/src/transformers/trainer.py", line 1467, in _maybe_log_save_evaluate
    tr_loss_scalar = self._nested_gather(tr_loss).mean().item()
  File "/content/transformers/src/transformers/trainer.py", line 1467, in _maybe_log_save_evaluate
    tr_loss_scalar = self._nested_gather(tr_loss).mean().item()
  File "/content/transformers/src/transformers/trainer.py", line 2373, in _nested_gather
    tensors = nested_xla_mesh_reduce(tensors, name)
  File "/content/transformers/src/transformers/trainer.py", line 2373, in _nested_gather
    tensors = nested_xla_mesh_reduce(tensors, name)
  File "/content/transformers/src/transformers/trainer_pt_utils.py", line 155, in nested_xla_mesh_reduce
    return xm.mesh_reduce(name, tensors, torch.cat)
  File "/content/transformers/src/transformers/trainer_pt_utils.py", line 155, in nested_xla_mesh_reduce
    return xm.mesh_reduce(name, tensors, torch.cat)
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/core/xla_model.py", line 916, in mesh_reduce
    return reduce_fn(xldata) if xldata else cpu_data
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/core/xla_model.py", line 916, in mesh_reduce
    return reduce_fn(xldata) if xldata else cpu_data
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated
 50%|█████████████▌             | 20/40 [08:22<08:22, 25.15s/it]
Traceback (most recent call last):
  File "transformers/examples/pytorch/xla_spawn.py", line 85, in <module>
    main()
  File "transformers/examples/pytorch/xla_spawn.py", line 81, in main
    xmp.spawn(mod._mp_fn, args=(), nprocs=args.num_cores)
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 394, in spawn
    start_method=start_method)
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 144, in join
    exit_code=exitcode
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 17

Expected behavior

No error.

huggingface / transformers

transformers seems to have recently been "bricked" #13798

Environment info

Who can help

Information

To reproduce

Expected behavior