Is the implementation of MTB right?

plkmo / BERT-Relation-Extraction

PyTorch implementation for "Matching the Blanks: Distributional Similarity for Relation Learning" paper

Apache License 2.0

573 stars 133 forks source link

Is the implementation of MTB right? #9

Closed shiningboy123 closed 4 years ago

shiningboy123 commented 4 years ago

I do not find any code about MTB loss, such as dot product performed before a binary classifier.

plkmo commented 4 years ago

If you read the paper in more detail, you will see that we are using noise-contrastive estimation (NCE) here. As the authors mentioned, it is intractable to do pairwise dot products in eq 1 to compare the embeddings of every possible relation statement pairs. We instead abstract out the pairwise probabilities with sigmoid activations after using NCE (with uniform noise distribution). You can refer to the dataset class for specific details

shiningboy123 commented 4 years ago

https://github.com/plkmo/BERT-Relation-Extraction/blob/master/src/model/BERT/modeling_bert.py#L750 The positive samples seem not to contribute to the loss, since the value of Q is 0.
https://github.com/plkmo/BERT-Relation-Extraction/blob/master/src/preprocessing_funcs.py#L265 As far as I can understand, the positive/negative class is respected to a relation statement pair in the paper. However, in your code, you just treat a single statement as a positive/negative sample.

plkmo commented 4 years ago

I have went back to recheck the implementation in more detail and yes, you're right. Thanks for pointing out this error. I have corrected this in the latest commit, so that now we do compare each relation statement with the others after NCE batching, implemented in the loss calculation.