Closed LemonQC closed 3 years ago
We essentially filter spans during sampling by using only ground truth entity mentions to train the relation classifier. This is actually also stated in the paper: "To train the relation classifier, we use ground truth relations as positive samples, and draw N_r negative samples from those entity pairs S_gt × S_gt that are not labeled with any relation"
I assume this answered your question, please write a comment otherwise.
Wow, Sorry for that, thanks for your help
@markus-eberts When sampling negative relations, will further limit the head/tail types could help performance? i.e. Don't sample [LOC] spans/entities for relation [PEOPLE]-(WORK FOR)-[ORG].
Hi @liebkne, I also have this idea in mind and it is certainly worth a try. I would not omit these samples totally (since the model still encounters these samples during inference) but reducing them might help performance. Usually samples with entity types not matching a relation are easily to tell apart by the model, so there is probably no need to sample those often. It would be great to hear from you in case you tried it :).
@markus-eberts Thanks for your reply. If we can apply this 'relation type-entity types' mapping in training, I think we can also apply it during inference? I might try that next week.
I have one more question about the NER part. I noticed that in my cases the NER part almost always have very high recall compared to precision (The correct entity is found and so are some FPs. Also usually boundary errors are more than type errors) which I think due to the span prediction approach. I think in real application we need to find a way to keep the correct one. Do you have any suggestion on this (during training/application)? Thank you!
Hi @liebkne ,
sorry for my late reply. Of course you could also apply the mapping during inference to filter pairs. Regarding the NER part: Yes, we also found boundary errors to happen more frequently than type errors (depending on the dataset). You can try to oversample spans that are predicted incorrectly by the model. For example, in JEREX (-> https://github.com/lavis-nlp/jerex) we oversample negative spans that overlap with ground truth mentions, since these are usually the hardest to distinguish.