SpikeGPT is a lightweight generative language model with pure binary, event-driven spiking activation units. The arxiv paper of SpikeGPT can be found here.
If you are interested in SpikeGPT, feel free to join our Discord using this link!
This repo is inspired by the RWKV-LM.
If you find yourself struggling with environment configuration, consider using the Docker image for SpikeGPT available on Github.
Download the enwik8
dataset by visiting the following link:
enwik8 dataset.
Modify the train set, validate set, and test set paths in the train.py
script to match the directory where you've extracted the files. For example, if you've extracted the files to a directory named enwik8_data
, your train.py
script should be updated as follows:
# Set the paths for the datasets
datafile_train = "path/to/enwik8_data/train"
datafile_valid = "path/to/enwik8_data/validate"
datafile_test = "path/to/enwik8_data/test"
Pre-Training on a Large Corpus:
Configuring the Training Script:
train.py
, uncomment line 82 to enable MMapIndexedDataset
as the dataset class. datafile_train
to the filename of your binidx file. .bin
or .idx
file extensions.Starting Multi-GPU Training:
Downloading Pre-Tokenized WikiText-103:
binidx
file from this Hugging Face dataset link.Fine-Tuning the Model:
3e-6
.You can choose to run inference with either your own customized model or with our pre-trained model. Our pre-trained model is available here. This model trained 5B tokens on OpenWebText.
run.py
to your custom promptrun.py
If you find SpikeGPT useful in your work, please cite the following source:
@article{zhu2023spikegpt,
title = {SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks},
author = {Zhu, Rui-Jie and Zhao, Qihang and Li, Guoqi and Eshraghian, Jason K.},
journal = {arXiv preprint arXiv:2302.13939},
year = {2023}
}