Open JieyuZ2 opened 5 years ago
Thanks for this comment. I initially chose to select this six possible skipgrams in order to somehow align with existing literature. You can definitely change to other positions and I think your proposed schedule is very reasonable. You can do a comparative analysis and I am looking forward to seeing some empricial results. Thanks.
Hi Jiaming,
In the code of extracting skip gram features https://github.com/mickeystroller/HiExpan/blob/master/src/featureExtraction/extractSkipGramFeature.py, the positions of possible skip gram are set as [(-1, 1), (-2, 1), (-3, 1), (-1, 3), (-2, 2), (-1, 2)] (line 30) , but I found when the center word is the first word of a sentence, the positions will actually become (0, 1) instead of (-1, 1) since there is no word before the center word, so maybe we should add positions like (0, 1), (0, 2) . Otherwise, we will see some entities have "a problem" feature but do not have " problem" feature. It may hurt when "_ problem" become an important feature later. Thanks!
Best, Jieyu