Closed helloworld123-lab closed 3 years ago
Hey @helloworld123-lab,
Thanks for the issue :-) Is there a specific reason to use Bert2bert for SWAG instead of just a BERT model?
I am sorry for the issue :) actually i am new in this field. i just started working on models using transformers. T5 is a text-to-text model, I just wanted to try how it can perform with bert2bert. Is this the wrong approach to Swag?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hello everyone,
I try to build multiple choice QA system using Bert2Bert. I follow the model given for Swag using t5 in https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb My complete code is here.https://colab.research.google.com/drive/1MAGCi5TC1S6GNW3CFEB0f2cMkQ5gpxdN?usp=sharing
To integrate bert2bert model, I follow this https://colab.research.google.com/drive/1Ekd5pUeCX7VOrMx94_czTkwNtLN32Uyu?usp=sharing notebook.
I created a Bert2BertFineTuner class considering T5FineTuner class in https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb I add the following changes to T5FineTuner class for Bert2Bert consideration. I just add
As above, I have updated the model, config, and tokenizer for bert2bert model. Also, sample input and target encoded pairs are as:
[CLS] context : in what spanish speaking north american country can you get a great cup of coffee? options : 1 : mildred's coffee shop 2 : mexico 3 : diner 4 : kitchen 5 : canteen < / s > [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] ** [CLS] 2 < / s > [SEP]
In the above example, 2 is indicating the label. And I run the model with the following parameters:
{'output_dir': 't5_swag', 'model_name_or_path': 'bert2bert', 'tokenizer_name_or_path': 'bert-base', 'max_seq_length': 512, 'learning_rate': 3e-05, 'weight_decay': 0.0, 'adam_epsilon': 1e-08, 'warmup_steps': 0, 'train_batch_size': 8, 'eval_batch_size': 8, 'num_train_epochs': 4, 'gradient_accumulation_steps': 16, 'n_gpu': 1, 'early_stop_callback': False, 'fp_16': False, 'opt_level': 'O1', 'max_grad_norm': 1.0, 'seed': 42, 'data_dir': ''}
It finishes the execution with following loss values:
The validation part:
metrics.accuracy_score(targets1, outputs1) 0.20065520065520065