Closed shiningboy123 closed 4 years ago
If you read the paper in more detail, you will see that we are using noise-contrastive estimation (NCE) here. As the authors mentioned, it is intractable to do pairwise dot products in eq 1 to compare the embeddings of every possible relation statement pairs. We instead abstract out the pairwise probabilities with sigmoid activations after using NCE (with uniform noise distribution). You can refer to the dataset class for specific details
https://github.com/plkmo/BERT-Relation-Extraction/blob/master/src/model/BERT/modeling_bert.py#L750 The positive samples seem not to contribute to the loss, since the value of Q is 0.
https://github.com/plkmo/BERT-Relation-Extraction/blob/master/src/preprocessing_funcs.py#L265 As far as I can understand, the positive/negative class is respected to a relation statement pair in the paper. However, in your code, you just treat a single statement as a positive/negative sample.
I have went back to recheck the implementation in more detail and yes, you're right. Thanks for pointing out this error. I have corrected this in the latest commit, so that now we do compare each relation statement with the others after NCE batching, implemented in the loss calculation.
I do not find any code about MTB loss, such as dot product performed before a binary classifier.