nyu-mll / jiant-v1-legacy

The jiant toolkit for general-purpose text understanding models
MIT License
21 stars 9 forks source link

Got RuntimeError: size mismatch when trying to train RoBERTa with SuperGLUE #1041

Open jeswan opened 4 years ago

jeswan commented 4 years ago

Issue by valerio65xz Friday Mar 20, 2020 at 17:15 GMT Originally opened as https://github.com/nyu-mll/jiant/issues/1041


Hello! I'm new in using jiant, and I want just to train a RoBERTa-BASE model with all SuperGLUE tasks, and fine tune on ReCoRD task.

Here there is my tutorial.conf

`// This imports the defaults, which can be overridden below. include "defaults.conf" // relative path to this file

// write to local storage by default for this demo exp_name = jiant-demo run_name = mtl-sst-mrpc

cuda = 0 random_seed = 42

load_model = 1 reload_tasks = 0 reload_indexing = 0 reload_vocab = 0

pretrain_tasks = "superglue" target_tasks = "record"

//because record is in pretrain tasks do_pretrain = 1 do_target_task_training = 0

classifier = mlp classifier_hid_dim = 32 max_seq_len = 10 max_word_v_size = 1000 pair_attn = 0

input_module = roberta-base d_word = 50

sent_enc = bow skip_embs = 0

batch_size = 8

lr = 0.0003

val_interval = 50 max_vals = 10 target_train_val_interval = 10 target_train_max_vals = 10

// Use += to inherit from any previously-defined task tuples. //Questa parte si usa per dare dei parametri custom al modello. Quindi dovrei vedere, ad esempio per RoBERTa, cosa metterci //sts-b += { // classifier_hid_dim = 512 // pair_attn = 0 // max_vals = 16 // val_interval = 10 //}`

After launch

python main.py --config_file jiant/config/tutorial.conf --overrides "exp_name = robertabase_superglue_record, run_name = run1"

I wait several minutes, then I get this error

03/20 11:58:04 AM: Fatal error in main(): Traceback (most recent call last): File "main.py", line 16, in <module> main(sys.argv[1:]) File "/usr/home/studenti/sp193030/jiant/jiant/__main__.py", line 588, in main phase="pretrain", File "/usr/home/studenti/sp193030/jiant/jiant/trainer.py", line 671, in train all_val_metrics, should_save, new_best = self._validate(n_val, tasks, batch_size) File "/usr/home/studenti/sp193030/jiant/jiant/trainer.py", line 953, in _validate task, task_infos, tasks, batch_size, all_val_metrics, n_examples_overall File "/usr/home/studenti/sp193030/jiant/jiant/trainer.py", line 843, in _calculate_validation_performance out = self._forward(batch, task=task) File "/usr/home/studenti/sp193030/jiant/jiant/trainer.py", line 1043, in _forward model_out = self._model.forward(task, batch) File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 855, in forward out = self._pair_sentence_forward(batch, task, predict) File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 1010, in _pair_sentence_forward logits = classifier(sent, mask, [batch["idx1"], batch["idx2"]]) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/jiant/jiant/modules/simple_modules.py", line 149, in forward logits = self.classifier(final_emb) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/jiant/jiant/modules/simple_modules.py", line 83, in forward logits = self.classifier(seq_emb) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 67, in forward return F.linear(input, self.weight, self.bias) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/functional.py", line 1352, in linear ret = torch.addmm(torch.jit._unwrap_optional(bias), input, weight.t()) RuntimeError: size mismatch, m1: [8 x 2048], m2: [1536 x 32] at /opt/conda/conda-bld/pytorch_1549630534704/work/aten/src/THC/generic/THCTensorMathBlas.cu:266 How can I fix this? Thanks

jeswan commented 4 years ago

Comment by sleepinyourhat Saturday Mar 21, 2020 at 13:48 GMT


Hrm, it looks like the input dimension to the final classifier is being set incorrectly. That would likely be at one of the classifier = lines like this: https://github.com/nyu-mll/jiant/blob/master/jiant/models.py#L677

I don't have an immediate guess what would cause this, and there's a good chance that there's a bug.

@pruksmhc @zphang @pyeres @W4ngatang - Any guesses what would cause that mismatch?

For now, though:

Sorry you're dealing with this. Stay well!

jeswan commented 4 years ago

Comment by valerio65xz Saturday Mar 21, 2020 at 19:58 GMT


I've tried training on demo.conf, but all ran good, it finished without any error. I've the last version of jiant, conda is installed with all the dependecies (I've ran setup.py to install all dependencies) and I've selected tasks one-by-one as you have suggested. But I still got the error :(

jeswan commented 4 years ago

Comment by W4ngatang Saturday Mar 21, 2020 at 20:48 GMT


Can you try setting sent_enc = 'none' instead of bow?

On Sat, Mar 21, 2020 at 3:58 PM valerio65xz notifications@github.com wrote:

I've tried training on demo.conf, but all ran good, it finished without any error. I've the last version of jiant, conda is installed with all the dependecies (I've ran setup.py to install all dependencies) and I've selected tasks one-by-one as you have suggested. But I still got the error :(

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nyu-mll/jiant/issues/1041#issuecomment-602095718, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKDWG4NMIGVMAUHY3AK7ILRIUL6VANCNFSM4LQQVLCQ .

jeswan commented 4 years ago

Comment by valerio65xz Saturday Mar 21, 2020 at 23:09 GMT


Now it says to me this:

03/21 11:38:35 PM: Fatal error in main(): Traceback (most recent call last): File "main.py", line 16, in <module> main(sys.argv[1:]) File "/usr/home/studenti/sp193030/jiant/jiant/__main__.py", line 558, in main model = build_model(args, vocab, word_embs, tasks, cuda_device) File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 292, in build_model args, vocab, d_emb, tasks, embedder, cove_layer File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 208, in build_sent_encoder "skip_embs is false and sent_enc is none, " File "/usr/home/studenti/sp193030/jiant/jiant/utils/utils.py", line 484, in assert_for_log assert condition, error_message AssertionError: skip_embs is false and sent_enc is none, which means that your token representations are zero-dimensional. Consider setting skip_embs. Traceback (most recent call last): File "main.py", line 27, in <module> raise e # re-raise exception, in case debugger is attached. File "main.py", line 16, in <module> main(sys.argv[1:]) File "/usr/home/studenti/sp193030/jiant/jiant/__main__.py", line 558, in main model = build_model(args, vocab, word_embs, tasks, cuda_device) File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 292, in build_model args, vocab, d_emb, tasks, embedder, cove_layer File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 208, in build_sent_encoder "skip_embs is false and sent_enc is none, " File "/usr/home/studenti/sp193030/jiant/jiant/utils/utils.py", line 484, in assert_for_log assert condition, error_message AssertionError: skip_embs is false and sent_enc is none, which means that your token representations are zero-dimensional. Consider setting skip_embs.

In my defaults.conf, skip_embs = 1

jeswan commented 4 years ago

Comment by pruksmhc Saturday Mar 21, 2020 at 23:36 GMT


In that case, try making skip_embs = True. Best, Yada

On Sat, Mar 21, 2020 at 4:09 PM valerio65xz notifications@github.com wrote:

Now it says to me this:

03/21 11:38:35 PM: Fatal error in main(): Traceback (most recent call last): File "main.py", line 16, in main(sys.argv[1:]) File "/usr/home/studenti/sp193030/jiant/jiant/main.py", line 558, in main model = build_model(args, vocab, word_embs, tasks, cuda_device) File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 292, in build_model args, vocab, d_emb, tasks, embedder, cove_layer File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 208, in build_sent_encoder "skip_embs is false and sent_enc is none, " File "/usr/home/studenti/sp193030/jiant/jiant/utils/utils.py", line 484, in assert_for_log assert condition, error_message AssertionError: skip_embs is false and sent_enc is none, which means that your token representations are zero-dimensional. Consider setting skip_embs. Traceback (most recent call last): File "main.py", line 27, in raise e # re-raise exception, in case debugger is attached. File "main.py", line 16, in main(sys.argv[1:]) File "/usr/home/studenti/sp193030/jiant/jiant/main.py", line 558, in main model = build_model(args, vocab, word_embs, tasks, cuda_device) File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 292, in build_model args, vocab, d_emb, tasks, embedder, cove_layer File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 208, in build_sent_encoder "skip_embs is false and sent_enc is none, " File "/usr/home/studenti/sp193030/jiant/jiant/utils/utils.py", line 484, in assert_for_log assert condition, error_message AssertionError: skip_embs is false and sent_enc is none, which means that your token representations are zero-dimensional. Consider setting skip_embs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_nyu-2Dmll_jiant_issues_1041-23issuecomment-2D602117312&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=ZHxTXmVPtbAbImfcAVHJag&m=liFRYW4oY_wLsNOmwexwFK_eGj31qAjwPPyzOAOLkJ0&s=p1meMfiHqofYzFYE5r_ysiC5EwIAQviUqfKP1IuEw-c&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ACNALOAUJTOA4A7WNUSYV43RIVCKRANCNFSM4LQQVLCQ&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=ZHxTXmVPtbAbImfcAVHJag&m=liFRYW4oY_wLsNOmwexwFK_eGj31qAjwPPyzOAOLkJ0&s=kbzv7iUgl9VgKFAkcDHLk1jXLduHdse9qRaySNcMSzY&e= .

jeswan commented 4 years ago

Comment by valerio65xz Sunday Mar 22, 2020 at 13:37 GMT


And now I get this:

03/22 01:19:55 PM: Fatal error in main(): Traceback (most recent call last): File "main.py", line 16, in <module> main(sys.argv[1:]) File "/usr/home/studenti/sp193030/jiant/jiant/__main__.py", line 588, in main phase="pretrain", File "/usr/home/studenti/sp193030/jiant/jiant/trainer.py", line 579, in train output_dict = self._forward(batch, task=task) File "/usr/home/studenti/sp193030/jiant/jiant/trainer.py", line 1043, in _forward model_out = self._model.forward(task, batch) File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 881, in forward out = self._multiple_choice_reading_comprehension_forward(batch, task, predict) File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 1241, in _multiple_choice_reading_comprehension_forward ex_embs, ex_mask = self.sent_encoder(inp, task) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/jiant/jiant/modules/sentence_encoder.py", line 104, in forward assert (word_embs_in_context is not None) or (task_word_embs_in_context is not None) AssertionError Traceback (most recent call last): File "main.py", line 27, in <module> raise e # re-raise exception, in case debugger is attached. File "main.py", line 16, in <module> main(sys.argv[1:]) File "/usr/home/studenti/sp193030/jiant/jiant/__main__.py", line 588, in main phase="pretrain", File "/usr/home/studenti/sp193030/jiant/jiant/trainer.py", line 579, in train output_dict = self._forward(batch, task=task) File "/usr/home/studenti/sp193030/jiant/jiant/trainer.py", line 1043, in _forward model_out = self._model.forward(task, batch) File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 881, in forward out = self._multiple_choice_reading_comprehension_forward(batch, task, predict) File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 1241, in _multiple_choice_reading_comprehension_forward ex_embs, ex_mask = self.sent_encoder(inp, task) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/jiant/jiant/modules/sentence_encoder.py", line 104, in forward assert (word_embs_in_context is not None) or (task_word_embs_in_context is not None) AssertionError

Anyway I have another question: everytime I start a training, it downloads the model and because I'm using the university server the connection is very slow:

03/22 12:45:09 PM: https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-pytorch_model.bin not found in cache or force_download set to True, downloading to /tmp/tmpaou3j81k

I've tried to set force_download = False, but nothing changed.

jeswan commented 4 years ago

Comment by mirzakhalov Thursday Mar 26, 2020 at 02:11 GMT


This worked fine for my experiment!

// Data and preprocessing settings
max_seq_len = 256 // Mainly needed for MultiRC, to avoid over-truncating
                  // But not 512 as that is really hard to fit in memory.

// Model settings
input_module = "roberta-large"
transformers_output_mode = "top"
pair_attn = 0 // shouldn't be needed but JIC
s2s = {
    attention = none
}
sent_enc = "none"
sep_embs_for_skip = 1
classifier = log_reg // following BERT paper
transfer_paradigm = finetune // finetune entire BERT model
jeswan commented 4 years ago

Comment by valerio65xz Friday Mar 27, 2020 at 17:02 GMT


This worked fine for my experiment!

// Data and preprocessing settings
max_seq_len = 256 // Mainly needed for MultiRC, to avoid over-truncating
                  // But not 512 as that is really hard to fit in memory.

// Model settings
input_module = "roberta-large"
transformers_output_mode = "top"
pair_attn = 0 // shouldn't be needed but JIC
s2s = {
    attention = none
}
sent_enc = "none"
sep_embs_for_skip = 1
classifier = log_reg // following BERT paper
transfer_paradigm = finetune // finetune entire BERT model

I think it's too much for my sistem:

03/27 05:41:21 PM: Fatal error in main(): Traceback (most recent call last): File "main.py", line 16, in <module> main(sys.argv[1:]) File "/usr/home/studenti/sp193030/jiant/jiant/__main__.py", line 588, in main phase="pretrain", File "/usr/home/studenti/sp193030/jiant/jiant/trainer.py", line 579, in train output_dict = self._forward(batch, task=task) File "/usr/home/studenti/sp193030/jiant/jiant/trainer.py", line 1043, in _forward model_out = self._model.forward(task, batch) File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 881, in forward out = self._multiple_choice_reading_comprehension_forward(batch, task, predict) File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 1241, in _multiple_choice_reading_comprehension_forward ex_embs, ex_mask = self.sent_encoder(inp, task) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/jiant/jiant/modules/sentence_encoder.py", line 98, in forward self._text_field_embedder(sent, task._classifier_name) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/jiant/jiant/huggingface_transformers_interface/modules.py", line 353, in forward _, output_pooled_vec, hidden_states = self.model(ids, attention_mask=input_mask) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/transformers/modeling_bert.py", line 740, in forward encoder_attention_mask=encoder_extended_attention_mask) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/transformers/modeling_bert.py", line 386, in forward layer_outputs = layer_module(hidden_states, attention_mask, head_mask[i], encoder_hidden_states, encoder_attention_mask) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/transformers/modeling_bert.py", line 366, in forward intermediate_output = self.intermediate(attention_output) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/transformers/modeling_bert.py", line 328, in forward hidden_states = self.intermediate_act_fn(hidden_states) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/transformers/modeling_bert.py", line 133, in gelu return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0))) RuntimeError: CUDA out of memory. Tried to allocate 45.50 MiB (GPU 0; 11.91 GiB total capacity; 10.96 GiB already allocated; 31.38 MiB free; 22.80 MiB cached) Traceback (most recent call last): File "main.py", line 27, in <module> raise e # re-raise exception, in case debugger is attached. File "main.py", line 16, in <module> main(sys.argv[1:]) File "/usr/home/studenti/sp193030/jiant/jiant/__main__.py", line 588, in main phase="pretrain", File "/usr/home/studenti/sp193030/jiant/jiant/trainer.py", line 579, in train output_dict = self._forward(batch, task=task) File "/usr/home/studenti/sp193030/jiant/jiant/trainer.py", line 1043, in _forward model_out = self._model.forward(task, batch) File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 881, in forward out = self._multiple_choice_reading_comprehension_forward(batch, task, predict) File "/usr/home/studenti/sp193030/jiant/jiant/models.py", line 1241, in _multiple_choice_reading_comprehension_forward ex_embs, ex_mask = self.sent_encoder(inp, task) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/jiant/jiant/modules/sentence_encoder.py", line 98, in forward self._text_field_embedder(sent, task._classifier_name) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/jiant/jiant/huggingface_transformers_interface/modules.py", line 353, in forward _, output_pooled_vec, hidden_states = self.model(ids, attention_mask=input_mask) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/transformers/modeling_bert.py", line 740, in forward encoder_attention_mask=encoder_extended_attention_mask) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/transformers/modeling_bert.py", line 386, in forward layer_outputs = layer_module(hidden_states, attention_mask, head_mask[i], encoder_hidden_states, encoder_attention_mask) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/transformers/modeling_bert.py", line 366, in forward intermediate_output = self.intermediate(attention_output) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/transformers/modeling_bert.py", line 328, in forward hidden_states = self.intermediate_act_fn(hidden_states) File "/usr/home/studenti/sp193030/.conda/envs/jiant/lib/python3.6/site-packages/transformers/modeling_bert.py", line 133, in gelu return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0))) RuntimeError: CUDA out of memory. Tried to allocate 45.50 MiB (GPU 0; 11.91 GiB total capacity; 10.96 GiB already allocated; 31.38 MiB free; 22.80 MiB cached)

jeswan commented 4 years ago

Comment by sleepinyourhat Friday Mar 27, 2020 at 20:58 GMT


Yes—your GPU is out of memory. Reducing either batch_size or max_seq_len should help.

jeswan commented 4 years ago

Comment by valerio65xz Friday Apr 03, 2020 at 00:26 GMT


Ok then, sorry but I've got some question because I don't understand how to replicate the correct results.

I've this tutorial.conf now:

`include "defaults.conf"

// write to local storage by default for this demo exp_name = jiant-demo run_name = mtl-sst-mrpc

cuda = 0 random_seed = 42

load_model = 1 reload_tasks = 0 reload_indexing = 0 reload_vocab = 0

pretrain_tasks = "boolq,commitbank,copa,multirc,rte-superglue,wic,winograd-coreference" target_tasks = "record"

classifier = log_reg classifier_hid_dim = 32 max_seq_len = 256 max_word_v_size = 1000 pair_attn = 0

input_module = roberta-base

transformers_output_mode = "top"

s2s = { attention = none }

sep_embs_for_skip = 1

transfer_paradigm = finetune // finetune entire BERT model

d_word = 50 sent_enc = none skip_embs = True batch_size = 8 lr = 0.0003

val_interval = 50 max_vals = 10 target_train_val_interval = 10 target_train_max_vals = 10`

I've used roberta-base instead large for memory and time issues. I would like to get, as in SuperGLUE leaderboard, a score in ReCoRD about 0.9. I've thought to pre-train over all SuperGLUE tasks and then fine-tune on ReCoRD. But I've got this scores

04/03 02:02:42 AM: Task 'record': sorting predictions by 'idx' 04/03 02:02:42 AM: Finished evaluating on: record 04/03 02:02:42 AM: Writing results for split 'val' to ./checkpoints/robertabase_superglue_record/results.tsv 04/03 02:02:42 AM: micro_avg: 0.160, macro_avg: 0.160, record_f1: 0.164, record_em: 0.156, record_avg: 0.160 04/03 02:02:42 AM: Done!

They are much worse than result of SuperGLUE's leaderboard.

Am I doing something wrong? Thanks in advance!

jeswan commented 4 years ago

Comment by W4ngatang Tuesday May 05, 2020 at 02:09 GMT


Hi Valerio,

I haven't run roberta-base myself, so I couldn't tell you if those results are what you should expect. However, given that roberta-large gets right around 90 on the leaderboard, I think it's quite unlikely you'll be able to get similar results with the base model. If you want to try additional pretraining, I'd recommend other QA tasks, instead of the other SuperGLUE tasks (except MultiRC, but the task format is somewhat different). Hope that helps!