short-text-classification

This repo contains a command line tool that helps train different short text classification models using different options. Created mainly as part of a thesis work on the subject, this tool helps users train deep learning models in a couple of papers referenced below, and enables users to be able to easily experiment with training those models using different loss functions, optimizers, datasets or word embeddings.

The models implemented are the ones described in the papers referenced below. I may implement a few more papers as I keep on expanding my thesis work.

Note that, for a while, these models may yield a much lower training, validation and testing accuracy due to implementation issues.

Requirements

This tool requires quite a few libraries as prerequisites. It uses Keras and, naturally, all its prerequisites. It also requires TensorFlow.

Additionally, although the tool supports taking a couple of different word embeddings as input, the word embeddings themselves should be separately downloaded, and if necessary unzipped, as well.

Similarly, the datasets supported should be downloaded and/or unzipped by the user separately. Currently, the only supported dataset is SwDA, and the dataset is also included in the swda submodule inside the repo. However, the user should unzip the file into a desired directory, as the tool itself does not handle the unzipping operation.

Finally, if --save-model option is to be used, the Python module h5py is required, and it can be installed via pip.

Word Embeddings:

Word2Vec: Download file from here
GloVe: Download file from here

Sample usage

Tool may be used in a couple of ways.

Using --loss-functions or --optimizers options, it may be used to list the loss functions and optimizers supported by the Keras version you are using.

foo@bar:~$ python core.py --loss-functions

Using --embeddings or --datasets options, it may be used to list the list of supported word embeddings and datasets, respectively.

foo@bar:~$ python core.py --embeddings

Similarly, using --models option, the list of implemented models may be printed.

foo@bar:~$ python core.py --models

Finally, to train a specific model by specifying a dataset, an embedding, a loss function and an optimizer, you may use a command similar to the one given below.

foo@bar:~$ python core.py --model Lee-Dernoncourt
                          --dataset SwDA <path_to_SwDA_dataset_directory>
                          --embedding GloVe <path_to_GloVe_embedding_file>

For a more detailed description of the capabilities of the tool, use --help option.

foo@bar:~$ python core.py --help

ilimugur / short-text-classification

readme

short-text-classification

Requirements

Word Embeddings:

Sample usage