swisscom / ai-research-keyphrase-extraction

EmbedRank: Unsupervised Keyphrase Extraction using Sentence Embeddings (official implementation)
Apache License 2.0
432 stars 88 forks source link

invalid value encountered in true_divide #19

Closed Crescentz closed 5 years ago

Crescentz commented 5 years ago

raw_text = 'this is the text i want to extract keyphrases from'

kp1 = launch.extract_keyphrases(embedding_distributor, pos_tagger, raw_text, 10, 'en') /home1/zy/anaconda3/envs/py36pc2t/lib/python3.6/site-packages/ai-research-keyphrase-extraction/swisscom_ai/research_keyphrase/model/method.py:44: RuntimeWarning: invalid value encountered in true_divide 0.5 + (sim_between_norm - np.nanmean(sim_between_norm, axis=0)) / np.nanstd(sim_between_norm, axis=0)

kamilbs commented 5 years ago

Hi this can happen when you have a very small number of candidates (less or equal than 2) which is not the case in general. Thanks for spotting this issue, I'll try to fix this special case. Meanwhile you can just try with slightly bigger sentence (that have at least 3 candidates).

Crescentz commented 5 years ago

thank you for your reply.it seems can be solved by: import numpy as np np.seterr(divide='ignore',invalid='ignore') But it's not clear that there would be any risk. by the way , sent2vec has updated, and will this project be updated later?

kamilbs commented 5 years ago

Yes you are totally right, what you did just shut down the warning. There is a risk in fact , if you have only 2 candidates and want to extract only one (the best one) the result is wrong.

I'll have a look at the updates of s2v and see what is needed to make this project with the latest version. However the update won't be done in the following days unless someone wants to do it through a PR :)