swisscom / ai-research-keyphrase-extraction

EmbedRank: Unsupervised Keyphrase Extraction using Sentence Embeddings (official implementation)
Apache License 2.0
432 stars 88 forks source link

How to extract keyphrase using Doc2Vec instead of Sent2Vec? #8

Closed amit-dat closed 6 years ago

amit-dat commented 6 years ago

Hello, I've been trying to run the embedRank using Doc2Vec. What would be the changes made to the config.ini file and steps involved to setup the EmbedRank project using Doc2Vec instead of the sent2vec technique that is already mentioned in the Repo description?

Thanks!

kamilbs commented 6 years ago

Hello you'll have to create your own EmbeddingDistributor (such as EmbeddingDistributorLocal) , (see embeddings package) . All you have to implement is a method which takes a list of sentences as input and return an array ( number of sentences x embedding dimension). To create this method you can search for a python interface for Doc2Vec. Finally to extract keyphrases :

import launch
from swisscom_ai.research_keyphrase.embeddings.your_custom_distrib import YourDoc2VecDistributor
embedding_distributor = YourDoc2VecDistributor(...)
pos_tagger = launch.load_local_pos_tagger('en')
kp1 = launch.extract_keyphrases(embedding_distributor, pos_tagger, raw_text, 10, 'en')