Open sumit-agrwl opened 2 years ago
I have added a new --wikidata_only
flag that you can use for the fuse_items
and extract_aliases
tasks. This excludes inputs from Wikipedia. Please notice that this means that you will not get entity popularity counts in the alias table.
So which steps do I need to run? I have ran till import_wikidata.
Also, my ultimate aim is to given a piece of text like “Who is the president of United States?”, it can extract wiki data from it like “president of United States”. If you can just tell me what needs to be done, it would be helpful. I could see the parse for other stores, but am not able to find any documentation for wikidata as such.
You need to run the following tasks in addition to import_wikidata:
sling compute_fanin fuse_items build_kb extract_aliases build_phrasetab --wikidata_only
This will produce a knowledge base (kb.sling) and a phrase table (phrase-table.repo). You can use the phrase table to look up matching phrases, see https://github.com/ringgaard/sling/blob/master/doc/guide/pyapi.md#phrase-tables.
Since both the knowledge base and the phrase table is in memory is is pretty fast to make lookups. You should be able to look up all subphrases up til a certain length (e.g. 10).
Since you have made changes for the flag in the source code, I am assuming I need to build from source.
I cloned the repo and ran setup.sh, but its giving me the error
ln: failed to create symbolic link '/usr/lib/python3.7/site-packages/sling': No such file or directory
After that I tried the below command
I am assuming I need to do this -
If you haven't run the setup.sh script already, you then need to link the sling Python module directly to the Python source directory to use it in "developer mode":
sudo ln -s $(realpath python) /usr/lib/python3/dist-packages/sling
For which I ran
sudo ln -s /usr/bin/python3 /usr/lib/python3/dist-packages/sling
But then sling command is still not working.
if your sling directory is /home/bob/sling, I think the ln command should be something like:
sudo ln -s /home/bob/sling/python /usr/lib/python3/dist-packages/sling
You can also just wait until tomorrow, where the changes has been included in the nightly build
Thank you for your prompt responses. I was able to run the command!
[2022-04-30 16:37:09.474628: F sling/task/task.cc:215] Input config is missing for task fused-items/item-reconciler
Seem like the config is not optional for item reconciler. Could I get you to try to add the auxin parameter in kb.py:
return self.wf.mapreduce(input=items,
output=output,
mapper="item-reconciler",
reducer="item-merger",
format="message/frame",
params={"indexed": True},
auxin={"config": self.recon_config()})
Thanks for your prompt reply. Its running.
I am not sure if I understood this. My query still lies in the fact, that given a query like "Who is the president of United States?" it can extract "president of United States" as an entity that matches to "President of the United States" (https://ringgaard.com/kb/Q11696). I am hoping there is some kind of sling parser that can do that. But I cannot find any documentation or process to do that. Also, it would be helpful if I can do the entity linking in different languages. I think there is support for that in this project, but I am not able to figure that out. Also, one more question is -
I need to change "Who is the president of United States?" to -
Who is the {entity in different language}? For eg : Who is the [Presidente de los Estados Unidos] ?
(this will be using the aliases in different languages), currently after running the steps that you suggested, I could just get the english name and no aliases.
If you want to match entity names in other languages you can use the --language
flag when generating the phrase tables ( extract_aliases and build_phrasetab). I should note that Wikidata is not "language-dependent" in the same sense as Wikipedia.
While there is a semantic parser and some entity resolution components in SLING, this is not really going to solve your problem. What you are asking for is really a question-answering system. This is a difficult research problem which many researchers and companies are actively working on. There are no simple solutions, but if you search for this, you will find references to many articles describing different approaches to this problem with each their strengths and weaknesses.
I dont want to use wikipedia for any processing. I just want to use the wikidata for entity matching in different languages. Can you guide me through the steps? I am assuming I need to only work with wikidata