Code Question - Githubissues

lavis-nlp / spert

PyTorch code for SpERT: Span-based Entity and Relation Transformer

MIT License

692 stars 148 forks source link

Code Question #44

Closed LemonQC closed 3 years ago

LemonQC commented 3 years ago

In the trainning process introduced in the paper, you use span filter,span classification, while in your source code, In the inference process, the span filtering can be found. Whether there any detail haven't been introduced.

markus-eberts commented 3 years ago

We essentially filter spans during sampling by using only ground truth entity mentions to train the relation classifier. This is actually also stated in the paper: "To train the relation classifier, we use ground truth relations as positive samples, and draw N_r negative samples from those entity pairs S_gt × S_gt that are not labeled with any relation"

markus-eberts commented 3 years ago

I assume this answered your question, please write a comment otherwise.

LemonQC commented 3 years ago

Wow, Sorry for that, thanks for your help

failable commented 3 years ago

@markus-eberts When sampling negative relations, will further limit the head/tail types could help performance? i.e. Don't sample [LOC] spans/entities for relation [PEOPLE]-(WORK FOR)-[ORG].

markus-eberts commented 3 years ago

Hi @liebkne, I also have this idea in mind and it is certainly worth a try. I would not omit these samples totally (since the model still encounters these samples during inference) but reducing them might help performance. Usually samples with entity types not matching a relation are easily to tell apart by the model, so there is probably no need to sample those often. It would be great to hear from you in case you tried it :).

failable commented 3 years ago

@markus-eberts Thanks for your reply. If we can apply this 'relation type-entity types' mapping in training, I think we can also apply it during inference? I might try that next week.

I have one more question about the NER part. I noticed that in my cases the NER part almost always have very high recall compared to precision (The correct entity is found and so are some FPs. Also usually boundary errors are more than type errors) which I think due to the span prediction approach. I think in real application we need to find a way to keep the correct one. Do you have any suggestion on this (during training/application)? Thank you!

markus-eberts commented 3 years ago

Hi @liebkne ,

sorry for my late reply. Of course you could also apply the mapping during inference to filter pairs. Regarding the NER part: Yes, we also found boundary errors to happen more frequently than type errors (depending on the dataset). You can try to oversample spans that are predicted incorrectly by the model. For example, in JEREX (-> https://github.com/lavis-nlp/jerex) we oversample negative spans that overlap with ground truth mentions, since these are usually the hardest to distinguish.