nimish15shah / DAG_Processor

A DAG processor and compiler for a tree-based spatial datapath.
MIT License
12 stars 4 forks source link

Energy-efficient execution of irregular directed acyclic graphs

The paper proposes a customized parallel architecture for energy-efficient execution of irregular directed acyclic graphs (DAG) from probabilistic machine learning and sparse linear algebra. A targeted compiler is developed to generate a binary program for the custom processor given an arbitrary DAG.

Aim of the experiments

The processor performance reported in the paper are reproduced through SystemVerilog RTL simulation. The processor instructions for the target workloads are generated with the custom compiler, validating the compilation algorithm.

This codebase includes the following components:

1) SystemVerilog-based microarchitectural RTL model of the processor (in ./hw/rtl/) and a testbench (in ./hw/tb/). 2) A Python-based compiler (in ./src/). 3) Input DAGs to reproduce the experiments (in ./data/)

Dependencies

With Anaconda (Recommended)

# Installation
git clone git@github.com:nimish15shah/DAG_Processor.git
cd DAG_Processor
conda create --name DAGprocessor --file conda-linux-64.yml

# Run experiments
conda activate DAGprocessor
./run.sh 

Without Anaconda

You can also run the experiment without Anaconda but using a Python virtual environment (Python version 3.7.7 is recommended):

# Installation
git clone git@github.com:nimish15shah/DAG_Processor.git
cd DAG_Processor
python3 -m venv venv_DAGprocessor
./venv_DAGprocessor/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

# Run experiments
./run.sh noconda 

Disk space requirement: Less than 200MB (not including Anaconda and Synopsys VCS installation).

Experiments runtime: 3-4 hours.

With zsh: Run the script as zsh run.sh or zsh run.sh noconda .

Workflow

The run script perfroms the following steps:

Outputs

The output charts are available at ./out/plots.

Note: The large DAGs ("Large PC") from table 1 in the paper are not evaluated in this version of the codebase due to large experimental runtime (>24hours).