hatsu3 / Sanger

41 stars 10 forks source link

Sanger

This repository implements the proposed framework in the paper Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture (MICRO'21)

Overview

Sanger, a framework that harvests sparsity in the attention mechanism through synergistic hardware and software co-design. The software part prunes the attention matrix into a dynamic structured pattern, and the hardware part features a reconfigurable architecture that exploits such pattern.

Getting Started

Requirements

Installation

  1. Clone or download this repository
  2. Download the CLOTH dataset from here to data/cloth
  3. Create a virtual environment (either virtualenv or conda) with a Python version of at least 3.7.
  4. Install dependent Python packages: pip install -r requirements.txt
  5. Set relevant environment variables
    1. export PROJ_ROOT=<path-to-this-repo>
    2. export WANDB_ENABLED=true to enable wandb logging (optional)

Experiment Workflow

Hardware experiments

  1. Run the tests.
    1. cd into the hardware/ directory, run sbt and type test into the sbt console.
  2. Check the result.
    • The tests generate random data and the corresponding control signals for the three modules.
    • The output of the modules is compared with a directly computed ground truth.
    • The relative error should be below a threshold of 5%, which does not affect the final accuracy.

Software experiments

  1. Evaluate Sanger performance

    1. Train a model with Sanger sparse attention.

      We provide scripts for training in the scripts/ sub-directory. For example, to train a Sanger-pruned BERT-Base model on SQuAD, you can execute scripts/train_sparse_on_squad.sh. Note that you have to pass in an appropriate configuration file, which you can find in configs/. You can skip this step if you choose to load a fine-tuned checkpoint directly.

    2. Evaluate the fine-tuned model.

      We also provide scripts for evaluation in scripts/. For example, to evaluate the sparse model from the last step, you can execute scripts/eval_sparse_on_squad.sh. If you need to load a checkpoint from a non-standard location, be sure to change the path in the script. When the evaluation is complete, the script should print out the accuracy.

    3. Measure sparsity and load balance.

      Each evaluation script contains a flag that enables measuring the sparsity level of attention and calculating the load balance of the PE array. If you set this flag in the previous step, the script will log the results to a CSV file named load_balance.csv during evaluation.

    4. Estimate the hardware performance of Sanger.

      We implement a simple simulator in bench_sanger.py that estimates the latency of executing an attention layer on Sanger, given average sparsity and load balance. Executing this Python script will read the CSV file generated in the previous step, and print the average sparsity, load balance and estimated latency.

  2. Comparison with dense attention and static sparse attention.

    1. Train a model with dense or static sparse attention.

      We provide dedicated scripts for train models with dense attention (e.g. scripts/train_dense_on_squad.sh). To train a model with static sparse attention, you can use the same script as Sanger and pass in an appropriate configuration file (e.g. bert_base_longformer.json).

    2. Evaluate the fine-tuned model.

      The process is similar to evaluating Sanger models. Note that you also need to use different scripts when evaluating dense models.

  3. Comparison with CPU and GPU.

    You can measure the latency of dense attention on CPU and GPU by executing bench_cpu_gpu.py.

Internals

Citation

Liqiang Lu, Yicheng Jin, Hangrui Bi, Zizhang Luo, Peng Li, Tao Wang, Yun Liang. Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture. The 54th International Symposium on Microarchitecture (MICRO’21), 2021.