plkmo / BERT-Relation-Extraction

PyTorch implementation for "Matching the Blanks: Distributional Similarity for Relation Learning" paper
Apache License 2.0
565 stars 132 forks source link

Why is my f1 so low? #8

Closed shark803 closed 4 years ago

shark803 commented 4 years ago

We use your albert pretrained checkpoint files, and the finetune. The f1 0.5978749489170413. Why the value is so low. We only use the default value of the args.

plkmo commented 4 years ago

Hi, Thanks for bringing up this point. I have tried finetuning ALBERT pretrained MTB on the SemEval task, and compared it against training ALBERT directly on SemEval. Indeed, it looks like the F1 score is worse if one uses pretrained MTB for ALBERT, so I guess ALBERT's architecture may not be suitable for MTB. I have updated the repo with the results for reference.

shark803 commented 4 years ago

Thank you for your kindly response. We find that Bert_uncase pretrained model performances better on SemEval. We just pretrain the mtb 2 epoches, and can get the f1 0.764. We try to pretrain more epoches on another server. But the code seems only support one gpu during the protrain, how can I fix the code to support more gpus?

plkmo commented 4 years ago

Generally, as presented in the paper, MTB has diminishing returns as the amount of fine-tuning dataset increases. So, practically, you might not need so many epochs (or even MTB) if you are fine-tuning on alot of data

I do not have multiple gpus so I can't support this, but it should be just adding a few lines with torch multiprocessing which you can lookup.

plkmo commented 4 years ago

Hi, please reclone to get the revised MTB model as the previous implementation was not correct #9