Open hanayashiki opened 5 years ago
After I edited seedLoader.py
from
if corpusName == "wiki":
userInput = [
["ROOT", -1, ["united_states", "china", "canada"]],
["united_states", 0, ["california", "illinois", "florida"]],
["china", 0, ["shandong", "zhejiang", "sichuan"]],
]
to
if corpusName == "wiki":
userInput = [
["ROOT", -1, ["United States", "China", "Canada"]],
["United States", 0, ["California", "Illinois", "Florida"]],
["China", 0, ["Shandong", "Zhejiang", "Sichuan"]],
]
It seems to be working. It seems that the phrases are not connect by "_" according to your paper.
Thanks for pointing this out. The seed entities need to appear in the generated entity2id.txt file. I think the phrases are connected with "_" during the embedding learning and corpus preprocessing stage but then converted back. Glad to hear you have started running the expansion code. Thanks.
Thanks for pointing this out. The seed entities need to appear in the generated entity2id.txt file. I think the phrases are connected with "_" during the embedding learning and corpus preprocessing stage but then converted back. Glad to hear you have started running the expansion code. Thanks.
I was using the preprocessed corpus downloaded from your given links. Maybe the sample inputs in the seedLoader.py
should be changed to be compatible with that
Hello, I would like to test HiExpan on wiki corpus. After featureExtraction, I ran
to test. But after loading those files in wiki/intermediate, I got:
It seems that
united_states
is not included in those entities. What could possibly be wrong? Thank you.