huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.25k stars 26.09k forks source link

evaluation in TFTrainer does not run on GPU #11590

Closed rohanshingade closed 3 years ago

rohanshingade commented 3 years ago

Environment info

Who can help

@patil-suraj @Rocketknight1

Information

I'm using TFT5ForConditionalGeneration for masked language modelling task. During training GPU utilisation is above 95% but as soon as evaluation starts it goes to 0%. Evaluation is slow. Even though evaluate function is in strategy.scope(). it does not use gpu.

The problem arises when using:

I'm using the official example script of TFTrainer and modified run_tf_glue.py a bit for custom data input.

The tasks I am working on is:

Final train_dataset and eval_dataset (input to TFTrainer) have the form ({"input_ids": , "attention_mask": ,"decoder_attention_mask": }, labels)

To reproduce

Steps to reproduce the behavior:

I tried reproducing the error using run_tf_squad.py and run_tf_glue.py but both the scripts gave error as the inputs to the trainer were not compatible. Only MRPC task worked, but it had only 400 examples in evaluation so hard to determine. Rest of them simply didn't work, there was an error.

If possible I would like to contribute to TFTrainer in terms of running evaluation on GPU and processing squad and glue dataset to match dimensions to TFTrainer inputs. Guidance is really appreciated.

Expected behavior

Rocketknight1 commented 3 years ago

Hi, TF maintainer here! We're currently in the process of rewriting the examples. We're deprecating TFTrainer and using more native Keras. Rewriting our GLUE examples is coming up very soon on my to-do list.

That said, the run_text_classification.py script is updated to our new standards - feel free to try that and just adapt the input data to use GLUE instead of your own inputs. If that doesn't work for you, I'll try to get the real GLUE script done soon!

mymusise commented 3 years ago

Hello, @Rocketknight1, great job, looking forward to the new version of the real GLUE script.

I found TFTrainer is still used in the run_text_classification.py, does TFTrainer will be abandoned?

Rocketknight1 commented 3 years ago

Are you sure? I can't see it anywhere: https://github.com/huggingface/transformers/blob/master/examples/tensorflow/text-classification/run_text_classification.py

mymusise commented 3 years ago

Sorry, you're right! I got another file here: https://github.com/huggingface/transformers/blob/master/examples/legacy/text-classification/run_tf_text_classification.py

Thank you, I will refer to it.