uhh-lt / wsd

A system for unsupervised knowledge-free interpretable word sense disambiguation based on distributional semantics
http://jobimtext.org/wsd
GNU General Public License v3.0
19 stars 5 forks source link
distributional-analysis distributional-semantics jobimtext sense sense-disambiguation word-sense-disambiguation wsd

Unsupervised Knowledge Free Word Sense Disambiguation

A software to construct and visualize Word Sense Disambiguation models based on JoBimText models. This project implements the method described in the following paper, please cite it if you use the paper in a research project:

@inproceedings{Panchenko:17:emnlp,
  author    = {Panchenko, Alexander and Marten, Fide and Ruppert, Eugen and Faralli, Stefano  and Ustalov, Dmitry and Ponzetto, Simone Paolo and Biemann, Chris},
  title     = {{Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation}},
  booktitle = {In Proceedings of the the Conference on Empirical Methods on Natural Language Processing (EMNLP 2017)},
  year      = {2017},
  address   = {Copenhagen, Denmark},
  publisher = {Association for Computational Linguistics},
  language  = {english}
}

Prerequisites

Serving the WSD model

Online demo

Download precalculated DB and pictures

We provide a ready for use database and a dump of pictures for all senses in the database. To download and prepare the project with those two artifacts, you can use the following command:

To download and untar it, you will need 300 GB of free disk space!

./wsd model:download

Note: For instructions on how to rebuild the DB with the model, please see below: Build your own DB

Start the web application

To start the application:

./wsd web-app:start

The web application runs with Docker Compose. To customize your installation adjust docker-compose.override.yml. See the official documentation for general information on this file.

To get further information on the running containers you can use all Docker Compose commands, such as docker-compose ps and docker-compose logs.

Build your own DB

First set the $SPARK_HOME environment variable or provide spark-submit on your path.

By modifying the script scripts/spark_submit_jar.sh you can adjust the amount of memory used by Spark (consider changing --conf 'spark.driver.memory=4g' and --conf 'spark.executor.memory=1g').

We recommend to first use a toy training data set to build a toy model within a few minutes.

Build small toy model

./wsd model:build-toy

This model only provides senses for the word "Python" but is fully functional.

Build full model

Building the full model will take nearly 11 hours on an eight core machine with 30 GB of memory and needs around 300 GB of free disk space. It will also download 4 GB of training data.

./wsd model:build-full

See also

./wsd --help