huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.99k stars 26.79k forks source link

Training TFBertForSequenceClassification with custom X and Y data #3075

Closed rahulg963 closed 4 years ago

rahulg963 commented 4 years ago

I am working on a TextClassification problem, for which I am trying to traing my model on TFBertForSequenceClassification given in huggingface-transformers library.

I followed the example given on their github page, I am able to run the sample code with given sample data using tensorflow_datasets.load('glue/mrpc'). However, I am unable to find an example on how to load my own custom data and pass it in model.fit(train_dataset, epochs=2, steps_per_epoch=115, validation_data=valid_dataset, validation_steps=7).

How can I define my own X, do tokenization of my X and prepare train_dataset with my X and Y. Where X represents my input text and Y represents classification category of given X.

Sample Training dataframe :

    text    category_index
0   Assorted Print Joggers - Pack of 2 ,/ Gray Pri...   0
1   "Buckle" ( Matt ) for 35 mm Width Belt  0
2   (Gagam 07) Barcelona Football Jersey Home 17 1...   2
3   (Pack of 3 Pair) Flocklined Reusable Rubber Ha...   1
4   (Summer special Offer)Firststep new born baby ...   0
Question already asked on SO : 
https://stackoverflow.com/questions/60463829/training-tfbertforsequenceclassification-with-custom-x-and-y-data
papapabi commented 4 years ago

Maybe this is a little late but you could take a look in both examples/run_tf_glue.py and this function fromsrc/transformers/data/processors/glue.py and write a custom training script based from those.

papapabi commented 4 years ago

To make things a little more concrete, I've written and annotated an end-to-end example of how to fine-tune a bert-base-cased model from your DataFrame's spec. Do comment if it helps you out!

rahulg963 commented 4 years ago

@papapabi Thank you for your inputs. I will check this out.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.