better positions for extracting skip gram feature?

mickeysjm / HiExpan

The source code used for automatic taxonomy construction method HiExpan, published in KDD 2018

GNU General Public License v3.0

71 stars 18 forks source link

Hi Jiaming,

In the code of extracting skip gram features https://github.com/mickeystroller/HiExpan/blob/master/src/featureExtraction/extractSkipGramFeature.py, the positions of possible skip gram are set as [(-1, 1), (-2, 1), (-3, 1), (-1, 3), (-2, 2), (-1, 2)] (line 30) , but I found when the center word is the first word of a sentence, the positions will actually become (0, 1) instead of (-1, 1) since there is no word before the center word, so maybe we should add positions like (0, 1), (0, 2) . Otherwise, we will see some entities have "a problem" feature but do not have " problem" feature. It may hurt when "_ problem" become an important feature later. Thanks!

Best, Jieyu

mickeysjm / HiExpan

better positions for extracting skip gram feature? #6