plkmo / BERT-Relation-Extraction

PyTorch implementation for "Matching the Blanks: Distributional Similarity for Relation Learning" paper
Apache License 2.0
565 stars 132 forks source link

About relation and entity labelling in pretraining #20

Closed wailoktam closed 3 years ago

wailoktam commented 3 years ago

Hi, you mention the pretrained Bert model as your requirement. But you do your own pretraining right?

The downloadable pretrained model is done with corpus that does not come with labels for relation extraction.

It sounds like you do your pretraining with text automatically given labels for entity and relation by spacy which may not be 100% correct. Am I correct in this?

For the training (I suppose this is fine-tuning), you use the Semeval dataset whose labels for entities and relations are supposed to be manually checked and 100% correct.

I suppose you have tried with the downloadable pretrained model that does not get any label for entities and relations. How much worse the performance would be when you do this?

Thanks.

plkmo commented 3 years ago

The original Bert pretrained on text corpus + MTB will give better results when finetuned compared to just original Bert pretrained on text corpus.