========================================================
_____________ ___ __ ___
/ ___/ ___/ _ \/ _ | / |/ /__ ____ ___ ___ ____
/ /__/ (_ / , _/ __ | / /|_/ / _ `/ _ \/ _ \/ -_) __/
\___/\___/_/|_/_/ |_| /_/ /_/\_,_/ .__/ .__/\__/_/
/_/ /_/
========================================================
This is a CGRA (Coarse-Grained Reconfigurable Architecture) mapper to map the target loops onto the CGRA. The CGRA is parameterizable (e.g., CGRA size, type of the computing units in each tile, communication connection, etc.). Different advanced mapping strategies are built on top of this basic mapper. CGRA Mapper currently provides following features and functionalities:
- It takes the arch&kernel info in
JSON
format.
- It can generate the DFG/CDFG of the target code region (in
.png
).
- Nested-loop and complex if/else control flows are supported with partial predication.
- Users can easily invoke loop-unrolling in the compile/run script (
opt --loop-unroll --unroll-count=4 -load PATH/libmapperPass.so -mapperPass kernel.bc
).
- It schedules and maps the DFG onto the CGRA arch that is represented in MRRG.
- The generated dfg.json and config.json can be taken as inputs for the simulation in the OpenCGRA (register index needed to be manually added/distinguished).
- Benchmark including a set of representative kernels/applications with compilation scripts can be found here.
Docker
The docker image is available here.
Showcase
// target FIR kernel
for (i = 0; i < NTAPS; ++i) {
sum += input[i] * coefficient[i];
}
Citation
@inproceedings{tan2020opencgra,
title={OpenCGRA: An Open-Source Unified Framework for Modeling, Testing, and Evaluating CGRAs},
author={Tan, Cheng and Xie, Chenhao and Li, Ang and Barker, Kevin J and Tumeo, Antonino},
booktitle={2020 IEEE 38th International Conference on Computer Design (ICCD)},
pages={381--388},
year={2020},
organization={IEEE}
}
License
CGRA-Mapper is offered under the terms of the Open Source Initiative BSD 3-Clause License. More information about this license can be found here:
Build
The mapper requires the following additional prerequisites:
$ mkdir build
$ cd build
$ cmake ..
$ make
Execution
-
The pass should be built and run with the same version of the LLVM.
-
A param.json indicating the configuration of the target CGRA and the kernel should locate in the kernel folder. Explaination of each field in the param.json:
- kernel: indicating the function name shown in the generated IR file. Note that different versions of LLVM could lead to different generated function names, which should be explicitly indicated in the
param.json
by the users.
- targetFunction: whether targeting the entire function or only the loop. Set it as
false
as CGRA mainly focus on loop acceleration.
- targetNested: whether targeting nested loop or not. For now, the nested loop is therotically supported but not efficient. Set it as
false
to target the inner-most loop.
- targetLoops: indicating the loops need to be mapped/accelerated. One function might contains multiple loops. However, for now, we can only target single loop. So this field can be set as
[loopID]
(e.g., [0]
or [1]
). If it is set as [0, 1]
, the first loop (i.e., loop 0
) will be selected.
- doCGRAMapping: indicating whether the mapping is performed. If you only care about the statistics (e.g., number of nodes/edges, loop-carry dependency length, number of loop-carry dependencies) of the loop DFG without mapping, this field can be set as
false
.
- row: the number of rows in the CGRA.
- column: the number of columns in the CGRA.
- precisionAware: whether distinguishing floating-point computation from fixed-point computation.
- heterogeneity: deprecated. Set as
false
and ignore it.
- isTrimmedDemo: simplifying the generated DFG (
.dot
).
- heuristicMapping:
true
indicates heuristic mapping while false
indicates exhaustive mapping. The heuristic mapping runs much faster than the exhaustive one but cannot guarantee an optimal solution.
- parameterizableCGRA: used to integrate with CGRA-Flow. Set as
false
by default.
- diagonalVectorization:
true
indicates half of the tiles (e.g., in a 16-tile CGRA, tile0, tile2, tile4, ..., tile14) additionally support vectorized operations, while false
means all the tiles support vectorized operations. If the target function doesn't contain any vectorized operation, true
or false
in this field doesn't make any difference.
- bypassConstraint: Additional constraint to limit the max number of data streams can go through a router/crossbar simultaneously in one cycle. Normally, this field should be set as the number of ports on the crossbar (e.g., 2 for a ring, 4 for a mesh, and 8 for a king-mesh).
- isStaticElasticCGRA: used to map a kernel/DFG on the Ultra-Elastic CGRA. Set as
false
by default.
- ctrlMemConstraint: should be set as II (at least). So a larger number is prefered, which probably leads to a valid mapping solution.
- regConstraint: the number of registers used to temporarily hold the arrived data for later computation. Set as 8 by default.
- optLatency: used to support multi-cycle execution. If this field is not specified, every operation is done in one single-cycle. Note that there is currently no hardware support for this feature, which is supposed to be used for performance exploration only.
- optPipelined: used to enable pipelined execution of the multi-cycle operation (i.e., indicated in optLatency).
- additionalFunc: used to enable specific functionalities on target tiles. Normally, we don't need to set this field as all the tiles already include most functionalities. By default, the
ld
/st
is only enabled on the left most tiles. So if you wanna enable the memory access on the other tiles, this field needs to be provided.
- incrementalMapping
true
indicates incremental mapping while false
indicates heuristic/exhaustive mapping. Incremental mapping re-utilizes the previous mapping results of current kernel (e.g., on 4x4 CGRA) to accelerate its mapping on the new resource allocation decisions (e.g., on 5x5 CGRA). To simply check the acceleration effect of incremental mapping, calls heuristic mapping first to generate increMapInput.json
for current kernel on 4x4 CGRA, then sets incrementalMapping to true
and performs mapping on 5x5 CGRA again, finally checks the elapsed time differences.
-
Run:
% opt -load ~/this repo/build/mapper/libmapperPass.so -mapperPass ~/target benchmark/target_kernel.bc
Related publications
- Cheng Tan, et al. “DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications.” The 39th IEEE International Conference on Computer Design. (ICCD'21), Oct 2021.
- Cheng Tan, et al. “OpenCGRA: Democratizing Coarse-Grained Reconfigurable Arrays.” The 32nd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP'21), A Virtual Conference, July 7-8, 2021.
- Cheng Tan, et al. "ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing." IEEE Transactions on Parallel and Distributed Systems (TPDS'21).
- Cheng Tan, et al. “AURORA: Automated Refinement of Coarse-Grained Reconfigurable Accelerators.” The 2021 Design, Automation & Test in Europe Conference, Grenoble, France. (DATE'21) February 1-5, 2021.
- Christopher Torng, et al. "Ultra-Elastic CGRAs for Irregular Loop Specialization." 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA'21).