mckinziebrandon / DeepChatModels

Conversation models in TensorFlow. (website removed)
MIT License
309 stars 75 forks source link
appengine-python chatbot conversation-models encoder-decoder flask googlecloud sequence-to-sequence tensorboard tensorflow

Conversation Models in Tensorflow

Notes to visitors:

Table of Contents

Project Overview

As of May 9, 2017, the main packages of the project are as follows:

From a user/developer standpoint, this project offers a cleaner interface for tinkering with sequence-to-sequence models. The ideal result is a chatbot API with the readability of Keras, but with a degree of flexibility closer to TensorFlow.

On the 'client' side, playing with model parameters and running them is as easy as making a configuration (yaml) file, opening a python interpreter, and issuing a handful of commands. The following snippet, for example, is all that is needed to start training on the cornell dataset (after downloading it of course) with your configuration:

    import data
    import chatbot
    from utils import io_utils
    # Load config dictionary with the flexible parse_config() function, 
    # which can handle various inputs for building your config dictionary.
    config = io_utils.parse_config(config_path='path_to/my_config.yml')
    dataset = getattr(data, config['dataset'])(config['dataset_params'])
    bot = getattr(chatbot, config['model'])(dataset, config)
    bot.train()

This is just one way to interface with the project. For example, the user can also pass in parameters via command-line args, which will be merged with any config files they specify as well (precedence given to command-line args if conflict). You can also pass in the location of a previously saved chatbot to resume training it or start a conversation. See main.py for more details.

Datasets

Models

Website

The webpage directory showcases a simple and space-efficient way for deploying your TensorFlow models in a Flask application. The models are 'frozen' -- all components not needed for chatting (e.g. optimizers) are removed and all remaining variables are converted to constants. When the user clicks on a model name, a REST API for that model is created. When the user enters a sentence into the form, an (AJAX) POST request is issued, where the response is the chatbot's response sentence. For more details on the REST API, see views.py.

The Flask application follows best practices, such as using blueprints for instantiating applications, different databases depending on the application environment (e.g. development or production), and more.

Model Components

Here I'll go into more detail on how the models are constructed and how they can be visualized. This section is a work in progress and not yet complete.

The Input Pipeline

Instead of using the feed_dict argument to input data batches to the model, it is substantially faster encode the input information and preprocessing techniques in the graph structure itself. This means we don't feed the model anything at training time. Rather the model uses a sequence of queues to access the data from files in google's protobuf format, decode the files into tensor sequences, dynamically batch and pad the sequences, and then feed these batches to the embedding decoder. All within the graph structure. Furthermore, this data processing is coordinated by multiple threads in parallel. We can use tensorboard (and best practices for variable scoping) to visualize this type of pipeline at a high level.

input_pipeline input_pipeline_expanded



(More descriptions coming soon!)

Reference Material

A lot of research has gone into these models, and I've been documenting my notes on the most "important" papers here in the last section of my deep learning notes here. The notes also include how I've tried translating the material from the papers into TensorFlow code. I'll be updating that as the ideas from more papers make their way into this project.