OpenChatKit

OpenChatKit provides a powerful, open-source base to create both specialized and general purpose models for various applications. The kit includes an instruction-tuned language models, a moderation model, and an extensible retrieval system for including up-to-date responses from custom repositories. OpenChatKit models were trained on the OIG-43M training dataset, which was a collaboration between Together, LAION, and Ontocord.ai.

In this repo, you'll find code for:

Training GPT-NeoXT-Chat-Base-20B, a 20B parameter chat model (see docs/GPT-NeoXT-Chat-Base-20B.md)
Fine-tuning Llama-2-7B-32K-beta, a 7B parameter long context model
Training Pythia-Chat-Base-7B, a 7B parameter chat model
Testing inference using either of the chat models
Augmenting the model with additional context from a retrieval index

Getting Started
- Requirements
- Chatting with Pythia-Chat-Base-7B
Fine-tuning Llama-2-7B-32K-beta
Reproducing Pythia-Chat-Base-7B
Monitoring
- Loguru
- Weights & Biases
Experimental: Retrieval-Augmented Models
See Also
License
Citing OpenChatKit
Acknowledgements

Getting Started

In this tutorial, you will download Pythia-Chat-Base-7B, an instruction-tuned language model, and run some some inference requests against it using a command-line tool.

Pythia-Chat-Base-7B is a 7B-parameter fine-tuned variant of Pythia-6.9B-deduped from Eleuther AI. Pre-trained weights for this model are available on Hugging Face as togethercomputer/Pythia-Chat-Base-7B under an Apache 2.0 license.

More details can be found on the model card for Pythia-Chat-Base-7B on Hugging Face.

Requirements

Before you begin, you need to install PyTorch and other dependencies.

Install Miniconda from their website.
Install Git LFS from their website.
Install the git lfs hooks.

git lfs install

Install mamba in the base environment so it's available in all environments.

conda install mamba -n base -c conda-forge

Create an environment called OpenChatKit using the environment.yml file at the root of this repo.

Note Use mamba to create the environment. It's much faster than using conda.

mamba env create -f environment.yml

Activate the new conda environment.

conda activate OpenChatKit

Chatting with Pythia-Chat-Base-7B

To help you try the model, inference/bot.py is a simple command-line test harness that provides a shell inferface enabling you to chat with the model. Simply enter text at the prompt and the model replies. The test harness also maintains conversation history to provide the model with context.

Start the bot by calling bot.py from the root for the repo.

python inference/bot.py --model togethercomputer/Pythia-Chat-Base-7B

Loading the model can take some time, but once it's loaded, you are greeted with a prompt. Say hello.

$ python inference/bot.py 
Loading /home/csris/src/github.com/togethercomputer/OpenChatKit/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B to cuda:1...
Welcome to OpenChatKit shell.   Type /help or /? to list commands.

>>> Hello.
Hello human.

>>>

Enter additional queries at the prompt, and the model replies. Under the covers, the shell is forming a prompt with all previous queries and passes that to the model to generate more text.

The shell also supports additional commands to inspect hyperparamters, the full prompt, and more. Commands are prefixed with a /.

Note The /quit command exits the shell.

Please see the inference README for more details about arguments, running on multiple/specific GPUs, and running on consumer hardware.

Fine-tuning Llama-2-7B-32K-beta

Llama-2-7B-32K-beta model can be fine-tuned using various datasets. In this tutorial, we will use the multi-document natural questions dataset and BookSum dataset.

Downloading and converting the base model

To download model Llama-2-7B-32K-beta and prepare it for fine-tuning, run this command from the root of the repository.

python pretrained/Llama-2-7B-32K-beta/prepare.py

The weights for this model will be in the pretrained/Llama-2-7B-32K-beta/togethercomputer_Llama-2-7B-32K-beta directory.

Fine-tuning the model

The training/finetune_llama-2-7b-32k-mqa.sh and training/finetune_llama-2-7b-32k-booksum.sh scripts configure and run the training loop.

To fine-tune the multi-document natural questions dataset, run:
```
bash training/finetune_llama-2-7b-32k-mqa.sh
```

To fine-tune the BookSum dataset, run:

bash training/finetune_llama-2-7b-32k-booksum.sh

As the training loop runs, checkpoints are saved to the model_ckpts directory at the root of the repo.

Please see the training README for more details about customizing the training run.

Converting trained weights to Hugging Face format

Before you can use this model to perform inference, it must be converted to the Hugging Face format. Run this command from the root of the repo to do so.

For example

mkdir huggingface_models \
  && python tools/convert_to_hf_llama.py \
       --config-name togethercomputer/Llama-2-7B-32K-beta \
       --ckpt-path model_ckpts/llama-2-7b-32k-mqa/checkpoint_10 \
       --save-path huggingface_models/llama-2-7b-32k-mqa \
       --n-stages 4 \
       --n-layer-per-stage 8 \
       --fp16

where the --fp16 flag will load and store models in fp16.

Make sure to replace model_ckpts/llama-2-7b-32k-mqa/checkpoint_10with the latest checkpoint in themodel_ckpts/llama-2-7b-32k-mqaormodel_ckpts/llama-2-7b-32k-booksum` directory.

Reproducing Pythia-Chat-Base-7B

This tutorial walks through reproducing the Pythia-Chat-Base-7B model by fine-tuning Eleuther AI's Pythia-6.9B-deduped model using the OIG dataset.

Downloading training data and the base model

The chat model was trained on the OIG dataset built by LAION, Together, and Ontocord.ai. To download the dataset from Hugging Face run the command below from the root of the repo.

python data/OIG/prepare.py

Note You can help make this chat model better by contributing data! See the OpenDataHub repo for more details.

Once the command completes, the data will be in the data/OIG/files directory.

Pythia-Chat-Base-7B is a fine-tuned variant of Pythia-6.9B-deduped from Eleuther AI. To download the model and prepare it for fine tuning, run this command from the root of the repo.

python pretrained/Pythia-6.9B-deduped/prepare.py

The weights for this model will be in the pretrained/Pythia-6.9B-deduped/EleutherAI_pythia-6.9b-deduped directory.

(Optional) 8bit Adam

To use 8bit-adam during training, install the bitsandbytes package.

pip install bitsandbytes # optional, to use 8bit-adam

Training the model

The training/finetune_Pythia-Chat-Base-7B.sh script configures and runs the training loop. After downloading the dataset and the base model, run:

bash training/finetune_Pythia-Chat-Base-7B.sh

As the training loop runs, checkpoints are saved to the model_ckpts directory at the root of the repo.

Please see the training README for more details about customizing the training run.

Converting weights to Hugging Face format

Before you can use this model to perform inference, it must be converted to the Hugging Face format. Run this command from the root of the repo to do so.

mkdir huggingface_models \
  && python tools/convert_to_hf_gptneox.py \
       --config-name EleutherAI/pythia-6.9b-deduped \
       --ckpt-path model_ckpts/Pythia-Chat-Base-7B/checkpoint_100 \
       --save-path huggingface_models/Pythia-Chat-Base-7B \
       --n-stages 4 \
       --n-layer-per-stage 8 \
       --fp16

where the --fp16 flag will load and store models in fp16.

Make sure to replace model_ckpts/Pythia-Chat-Base-7B/checkpoint_100 with the latest checkpoint in the model_ckpts/Pythia-Chat-Base-7B directory.

Testing the new model

You can use the OpenChatKit Shell test harness to chat with the new model. From the root of the repo, run

python inference/bot.py

By default the script will load the model named Pythia-Chat-Base-7B under the huggingface_models directory, but you can override that behavior by specifying --model.

python inference/bot.py --model ./huggingface_models/GPT-NeoXT-Chat-Base-20B

Once the model has loaded, enter text at the prompt and the model will reply.

$ python inference/bot.py 
Loading /home/csris/src/github.com/togethercomputer/OpenChatKit/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B to cuda:1...
Welcome to OpenChatKit shell.   Type /help or /? to list commands.

>>> Hello.
Hello human.

>>>

The shell also supports additional commands to inspect hyperparamters, the full prompt, and more. Commands are prefixed with a /.

Note The /quit command exits the shell.

Please see the inference README for more details about arguments, running on multiple/specific GPUs, and running on consumer hardware.

Monitoring

By default, the training script simply prints the loss as training proceeds, but it can also output metrics to a file using loguru or report them to Weights & Biases.

Loguru

Add the flag --train-log-backend loguru to your training script to log to ./logs/file_{time}.log

Weights & Biases

To use Weights & Biases, first login with your Weights & Biases token.

wandb login

And set --train-log-backend wandb in the training script to enable logging to Weights & Biases.

Experimental: Retrieval-Augmented Models

Warning Retrieval support is experimental.

The code in /retrieval implements a python package for querying a Faiss index of Wikipedia. The following steps explain how to use this index to augment queries in the test harness with context from the retriever.

Download the Wikipedia index.

python data/wikipedia-3sentence-level-retrieval-index/prepare.py

Run the bot with the --retrieval flag.

python inference/bot.py --retrieval

After starting, the bot will load both the chat model and the retrieval index, which takes a long time. Once the model and the index are loaded, all queries will be augmented with extra context.

$ python inference/bot.py --retrieval
Loading /OpenChatKit/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B to cuda:0...
Loading retrieval index...
Welcome to OpenChatKit shell.   Type /help or /? to list commands.

>>> Where is Zurich?
Where is Zurich?
Zurich is located in Switzerland.

>>>

License

All code in this repository was developed by Together Computer except where otherwise noted. Copyright (c) 2023, Together Computer. All rights reserved. The code is licensed under the Apache 2.0 license.

Copyright 2023 Together Computer

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

This repository also contains code written by a number of other authors. Such contributions are marked and the relevant licensing is included where appropriate.

For full terms, see the LICENSE file. If you have any questions, comments, or concerns about licensing please contact us.

Citing OpenChatKit

@software{openchatkit,
  title = {{OpenChatKit: An Open Toolkit and Base Model for Dialogue-style Applications}},
  author = {Together Computer},
  url = {https://github.com/togethercomputer/OpenChatKit}
  month = {3},
  year = {2023},
  version = {0.15},
}

Acknowledgements

Our models are fine-tuned versions of large language models trained by Eleuther AI. We evaluated our model on HELM provided by the Center for Research on Foundation Models. And we collaborated with both CRFM and HazyResearch at Stanford to build this model.

We collaborated with LAION and Ontocord.ai to build the training data used to fine tune this model.

togethercomputer / OpenChatKit

readme