Semanticizest is a package for doing entity linking, also known as semantic linking or semanticizing: you feed it text, and it outputs links to pertinent Wikipedia concepts. You can use these links as a "semantic representation" of the text for NLP or machine learning, or just to provide some links to background info on the Wikipedia.
pip install -r requirements.txt
pip install .
To train a semanticizer, download a Wikipedia database dump from
https://dumps.wikimedia.org/
. Then issue::
python -m semanticizest.parse_wikidump <dump> <model-filename>
The result will be a semanticizer model (in SQLite 3 format, if you must know).
Alternatively, you can use the --download flag to instruct semanticizest to
download the LATEST wikipedia dump. For example, to download and process the
Scottish Wikipedia
_ (which is small and useful for testing)::
python -m semanticizest.parse_wikidump --download scowiki sco.model
will download https://dumps.wikimedia.org/scowiki/latest/scowiki-latest-pages-articles.xml.bz2
to scowiki.xml.bz2
and construct the model from it.
Full documentation can be found at https://semanticize.github.io/semanticizest/
Copyright 2014 University of Amsterdam/Netherlands eScience Center.
The license for the semanticizest is Apache License, Version 2.0
_.
See the file LICENSE for details.
.. Apache License, Version 2.0
:
http://www.apache.org/licenses/LICENSE-2.0.html
.. Scottish Wikipedia
:
https://sco.wikipedia.org