qleenju / PDPU

PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applications
https://ieeexplore.ieee.org/document/10182007
Apache License 2.0
35 stars 7 forks source link
arithmetic-units dot-product mixed-precision posit posit-arithmetic posit-arithmetic-generator systemverilog unum

PDPU

An Open-Source Posit Dot-Product Unit (PDPU) for Deep Learning Applications

ISCAS 2023 [ Paper | Slide ]

Authors: Qiong Li, Chao Fang, and Zhongfeng Wang @ Nanjing University

[ English | 简体中文 ]

Overview

The proposed PDPU performs a dot-product of two input vectors $V_a$ and $V_b$ in low-precision format, and then accumulates the dot-product result and previous output $acc$ to a high-precision value $out$ as shown below: $$out = acc+V_a\times V_b = acc+a_0\cdot b_0+a_1\cdot b1+...+a{N-1}\cdot b_{N-1}$$

It introduces the following contributions and features:

Architecture

The architeture of PDPU equipped with a fined-grained 6-stage pipeline is depicted as follows:

Architecture of the proposed posit dot-product unit

The dataflow at each pipeline stage is as follows:

Getting Started

The PDPU is implemented using SystemVerilog, and the module hierarchy is as follows:

pdpu_top.sv                         # top module, combinationally implemented
pdpu_top_pipelined.sv               # PDPU equipped with a fine-grained 6-stage pipeline
├── registers.svh                   # register header file
├── pdpu_pkg.sv                     # package, packaging common functions, etc.
├── posit_decoder.sv                # posit decoder, extracting valid components of posit inputs
│   ├── pdpu_pkg.sv
│   ├── lzc.sv                      # leading zero count
│       └── cf_math_pkg.sv
│   └── barrel_shifter.sv           # barrel shifter
├── radix4_booth_multiplier.sv      # modified radix-4 booth wallace multiplier
│   ├── gen_prods.sv                # generate partial products
│       └── gen_product.sv          # generate a partial product according to booth encoding result
│           └── booth_encoder.sv    # radix-4 booth encoder
│   └── csa_tree.sv                 # recursive carry-save-adder (CSA) tree
│       ├── compressor_3to2.sv      # 3:2 compressor
│           └── fulladder.sv        # full adder
│       └── compressor_4to2.sv      # 4:2 compressor
│           └── counter_5to3.sv     # 5:3 counter
├── comp_tree.sv                    # recursive comparator tree
│   └── comparator.sv               # Comparator between two signed numbers
├── barrel_shifter.sv
├── csa_tree.sv             
│   ├── compressor_3to2.sv
│       └── fulladder.sv
│   └── compressor_4to2.sv
│       └── counter5to3.sv
├── mantissa_norm.sv                # mantissa normalization
│   ├── lzc.sv
│       └── cf_math_pkg.sv
│   └── barrel_shifter.sv
├── posit_encoder.sv                # posit encoder, packing result components into posit output
│   └── pdpu_pkg.sv
└── └── barrel_shifter.sv

Benefitting from the highly parameterized sub-modules, PDPU can be configured from several aspects, i.e., posit formats, dot-product size, and alignment width.

Publication

If you find PDPU helpful in your work, please cite us:

@inproceedings{li2023pdpu,
  title={PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applications},
  author={Li, Qiong and Fang, Chao and Wang, Zhongfeng},
  booktitle={2023 IEEE International Symposium on Circuits and Systems (ISCAS)},
  year={2023},
  organization={IEEE}
}