relatio-nlp / relatio

code base for constructing narrative statements from text
MIT License
93 stars 26 forks source link

Use phrase-BERT for entity encoding #49

Closed elliottash closed 2 years ago

elliottash commented 2 years ago

Add phrase BERT as an option for encoding the entities, before clustering: https://aclanthology.org/2021.emnlp-main.846.pdf

aplamada commented 2 years ago

Should use the implementation mentioned in the paper https://github.com/sf-wa-326/phrase-bert-topic-model ?

PinchOfData commented 2 years ago

spaCy has a Roberta implementation: https://spacy.io/models/en (see en_core_web_trf). relatio-v0.3 has support for spaCy embeddings, so this could be the easiest approach.

muhark commented 2 years ago

SpaCy's en_core_web_trf model could be used with SBERT mean pooling operation to approximate the phrase-BERT embeddings, but this wouldn't benefit from the contrastive SBERT pre-training or phrase-BERT pre-training.

It would add dependencies to the project, but a straightforward solution may be to use the one uploaded to huggingface hub:

https://huggingface.co/whaleloops/phrase-bert

I'm experimenting with this implementation anyway, as transformers is already a dependency in my project.