lovit / KR-WordRank

비지도학습 방법으로 한국어 텍스트에서 단어/키워드를 자동으로 추출하는 라이브러리입니다
Other
353 stars 57 forks source link

what does the r in keyword.items() means? #4

Closed KinamSalad closed 4 years ago

KinamSalad commented 5 years ago

for word, r in sorted(keywords.items(), key=lambda x:x[1], reverse=True)[:30]:

In this line, I can see the 'r' which is extracted by keywords.item. What does the r means? Total number of r is not constant value and It seems it does not match to number of vocabs.

I want to use r as indicator to compare the two results from two different domains.

Thank you

lovit commented 5 years ago

"kr-wordrank" uses HITS (it seems like to PageRank), one of graph ranking algorithm. And the variable "r" represents the rank of the "word".

The sum of the ranks of all nodes in graph is always fixed at 100, but the sum of ranks of top-ranked words can change.

The scale of the rank differs depending on the number of nodes. Therefore, you need to calibrate the different scales by multiplying the rank value by the number of nodes.

For example,

domain1_keywords, _, _ = wordrank_extractor.extract(domain1_texts)
n_keywords1 = len(domain1_keywords)
domain1_keywords = {k:r * n_keywords1 for k, r in domain1_keywords.items()}

domain2_keywords, _, _ = wordrank_extractor.extract(domain2_texts)
n_keywords2 = len(domain1_keywords)
domain2_keywords = {k:r * n_keywords2 for k, r in domain1_keywords.items()}

I hope this answer helpful to you

lovit commented 4 years ago

I think this issues is not activate anymore. Therefore, I closed this issue.