generic text classification with TensorFlow error (AttributeError: 'TFTrainingArguments' object has no attribute 'args')

c-col commented 3 years ago

Environment info

transformers version: 3.2.0
Platform: Linux-4.15.0-1091-oem-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.6.9
PyTorch version (GPU?): not installed (NA)
Tensorflow version (GPU?): 2.3.0 (True)
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help

@jplu

Information

Model I am using (Bert, XLNet ...): bert-base-multilingual-uncased

The problem arises when using:

[x] the official example scripts: (give details below)
[ ] my own modified scripts: (give details below) Running run_tf_text_classification.py with flags from the example in the "Run generic text classification script in TensorFlow" section of examples/text-classification

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)
[x] my own task or dataset: (give details below) Text classification dataset for classifying answers to questions. Using 3 CSVs (train, dev, and test) that each have headers (class, text) and columns containing class labels (int) and questions (strings). There are no commas present in the questions, for reference.

To reproduce

Steps to reproduce the behavior:

Call run_tf_text_classification.py with flags from the example in the "Run generic text classification script in TensorFlow" section of examples/text-classification:

python run_tf_text_classification.py \
--train_file train.csv \
--dev_file dev.csv \ 
--test_file test.csv \ 
--label_column_id 0 \ 
--model_name_or_path bert-base-multilingual-uncased \
--output_dir model \
--num_train_epochs 4 \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 32 \
--do_train \
--do_eval \
--do_predict \
--logging_steps 10 \
--evaluate_during_training \
--save_steps 10 \
--overwrite_output_dir \
--max_seq_length 128

Error is encountered:

Traceback (most recent call last):
  File "run_tf_text_classification.py", line 283, in <module>
    main()
  File "run_tf_text_classification.py", line 199, in main
    training_args.n_replicas,
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/transformers/file_utils.py", line 936, in wrapper
    return func(*args, **kwargs)
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/transformers/training_args_tf.py", line 180, in n_replicas
    return self._setup_strategy.num_replicas_in_sync
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/transformers/file_utils.py", line 914, in __get__
    cached = self.fget(obj)
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/transformers/file_utils.py", line 936, in wrapper
    return func(*args, **kwargs)
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/transformers/training_args_tf.py", line 122, in _setup_strategy
    if self.args.xla:
AttributeError: 'TFTrainingArguments' object has no attribute 'args'

If the logger.info call is commented out (lines 197-202), the above error is prevented but another error is encountered:

Traceback (most recent call last):
  File "run_tf_text_classification.py", line 282, in <module>
    main()
  File "run_tf_text_classification.py", line 221, in main
    max_seq_length=data_args.max_seq_length,
  File "run_tf_text_classification.py", line 42, in get_tfds
    ds = datasets.load_dataset("csv", data_files=files)
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/datasets/load.py", line 604, in load_dataset
    **config_kwargs,
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/datasets/builder.py", line 158, in __init__
    **config_kwargs,
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/datasets/builder.py", line 269, in _create_builder_config
    for key in sorted(data_files.keys()):
TypeError: '<' not supported between instances of 'NamedSplit' and 'NamedSplit'

Here is a pip freeze:

absl-py==0.10.0
astunparse==1.6.3
cachetools==4.1.1
certifi==2020.6.20
chardet==3.0.4
click==7.1.2
dataclasses==0.7
datasets==1.0.2
dill==0.3.2
filelock==3.0.12
gast==0.3.3
google-auth==1.21.3
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
grpcio==1.32.0
h5py==2.10.0
idna==2.10
importlib-metadata==2.0.0
joblib==0.16.0
Keras-Preprocessing==1.1.2
Markdown==3.2.2
numpy==1.18.5
oauthlib==3.1.0
opt-einsum==3.3.0
packaging==20.4
pandas==1.1.2
protobuf==3.13.0
pyarrow==1.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2020.1
regex==2020.7.14
requests==2.24.0
requests-oauthlib==1.3.0
rsa==4.6
sacremoses==0.0.43
scipy==1.4.1
sentencepiece==0.1.91
six==1.15.0
tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
tensorflow==2.3.0
tensorflow-estimator==2.3.0
termcolor==1.1.0
tokenizers==0.8.1rc2
tqdm==4.49.0
transformers==3.2.0
urllib3==1.25.10
Werkzeug==1.0.1
wrapt==1.12.1
xxhash==2.0.0
zipp==3.2.0

Expected behavior

Model begins to train on custom dataset.

jplu commented 3 years ago

Hello!

This is fixed in master.

sunnyville01 commented 3 years ago

@jplu Sorry, but I'm facing the same issue, and have version 3.2 installed. Can you please elaborate on how I might fix this? Thanks.

jplu commented 3 years ago

@sunnyville01 Just install the version on master with pip install git+https://github.com/huggingface/transformers.git

sunnyville01 commented 3 years ago

@jplu Thanks, that fixed it.

astromad commented 3 years ago

I am still facing this issue on colab with !pip install git+https://github.com/huggingface/transformers.git

`--------------------------------------------------------------------------- AttributeError Traceback (most recent call last)

in () 17 learning_rate=LEARNING_RATE 18 ) ---> 19 with training_argsTF.strategy.scope(): 20 modelTF = TFAutoModelForSequenceClassification.from_pretrained( 21 model_args['model_name'], 4 frames /usr/local/lib/python3.6/dist-packages/transformers/training_args_tf.py in _setup_strategy(self) 120 logger.info("Tensorflow: setting up strategy") 121 --> 122 if self.args.xla: 123 tf.config.optimizer.set_jit(True) 124 AttributeError: 'TFTrainingArguments' object has no attribute 'args'`

jplu commented 3 years ago

Something must be wrong with your install process, because this bug is fixed in master.

astromad commented 3 years ago

My bad, did not notice "requirements already met message", updated to !pip install --upgrade git+https://github.com/huggingface/transformers.git

No more issue! Sorry .

Santosh-Gupta commented 3 years ago

Something must be wrong with your install process, because this bug is fixed in master.

The error seems to persist with me. I installed using !pip install git+https://github.com/huggingface/transformers.git and got the same error TypeError: '<' not supported between instances of 'NamedSplit' and 'NamedSplit'

Here's is a colab notebook, you can do runtime-> run all , and see the output of the last cell.

https://colab.research.google.com/drive/1r3XCKYA8RBtfYmU2jqHVJT-uTt1ii04S?usp=sharing

pvcastro commented 3 years ago

@jplu I'm also getting the same error TypeError: '<' not supported between instances of 'NamedSplit' and 'NamedSplit', and I also ran the colab from @Santosh-Gupta and the error happened too. My local environment is also based on transformer's master branch.

jplu commented 3 years ago

@pvcastro Can you open a new issue please with all the details to be able for us to reproduce it. This thread is closed and about a different one.

huggingface / transformers