huggingface / transformers

šŸ¤— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.7k stars 26.94k forks source link

New `--log_level` feature introduces failures using 'passive' mode #12310

Closed allenwang28 closed 3 years ago

allenwang28 commented 3 years ago

Environment info

Who can help

@stas00 @sgugger

Information

Model I am using (Bert, XLNet ...): XLNet

The problem arises when using:

The tasks I am working on is:

This was captured by Cloud TPU tests (XLNet/MNLI/GLUE), but I think this behavior is model/dataset agnostic. Essentially, it seems that:

  1. The training_args's __post_init__ method should convert the log_level to -1 if it's set to 'passive' (which it is by default).
  2. However in the end-to-end run_glue.py example, using parse_args_into_dataclasses() seems to not call __post_init__, as our tests are failing with:
    Traceback (most recent call last):
    File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 329, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
    File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 323, in _start_fn
    fn(gindex, *args)
    File "/transformers/examples/pytorch/text-classification/run_glue.py", line 554, in _mp_fn
    main()
    File "/transformers/examples/pytorch/text-classification/run_glue.py", line 468, in main
    data_collator=data_collator,
    File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/trainer.py", line 295, in __init__
    logging.set_verbosity(log_level)
    File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/utils/logging.py", line 161, in set_verbosity
    _get_library_root_logger().setLevel(verbosity)
    File "/root/anaconda3/envs/pytorch/lib/python3.6/logging/__init__.py", line 1284, in setLevel
    self.level = _checkLevel(level)
    File "/root/anaconda3/envs/pytorch/lib/python3.6/logging/__init__.py", line 195, in _checkLevel
    raise ValueError("Unknown level: %r" % level)
    ValueError: Unknown level: 'passive'

To reproduce

Steps to reproduce the behavior:

  1. The command we're using is:
    git clone https://github.com/huggingface/transformers.git
    cd transformers && pip install .
    git log -1
    pip install datasets
    python examples/pytorch/xla_spawn.py \
      --num_cores 8 \
      examples/pytorch/text-classification/run_glue.py \
      --logging_dir=./tensorboard-metrics \
      --task_name MNLI \
      --cache_dir ./cache_dir \
      --do_train \
      --do_eval \
      --num_train_epochs 3 \
      --max_seq_length 128 \
      --learning_rate 3e-5 \
      --output_dir MNLI \
      --overwrite_output_dir \
      --logging_steps 30 \
      --save_steps 3000 \
      --overwrite_cache \
      --tpu_metrics_debug \
      --model_name_or_path xlnet-large-cased \
      --per_device_train_batch_size 32 \
      --per_device_eval_batch_size 16

Expected behavior

stas00 commented 3 years ago

Thank you for the report, it will be fixed shortly via https://github.com/huggingface/transformers/pull/12309

I'm just working on a test - need another 10min or so

allenwang28 commented 3 years ago

Thank you for fixing this so quickly!