Closed elliottash closed 2 years ago
Should use the implementation mentioned in the paper https://github.com/sf-wa-326/phrase-bert-topic-model ?
spaCy has a Roberta implementation: https://spacy.io/models/en (see en_core_web_trf). relatio-v0.3 has support for spaCy embeddings, so this could be the easiest approach.
SpaCy's en_core_web_trf
model could be used with SBERT mean pooling operation to approximate the phrase-BERT embeddings, but this wouldn't benefit from the contrastive SBERT pre-training or phrase-BERT pre-training.
It would add dependencies to the project, but a straightforward solution may be to use the one uploaded to huggingface hub:
https://huggingface.co/whaleloops/phrase-bert
I'm experimenting with this implementation anyway, as transformers
is already a dependency in my project.
Add phrase BERT as an option for encoding the entities, before clustering: https://aclanthology.org/2021.emnlp-main.846.pdf