This PR introduces support for selecting compiler strategies between current (default) and three new strategies using --strategy command line option.
local-isolated is current and default strategy that assumes local memory is not shared between layers;
local-vars new strategy that forces using local memory to share vars between layers;
local-consts new strategy that forces using local memory to share consts assuming they are preloaded before program run;
local-vars-and-consts new strategy that combines local-vars with local-consts.
Following are performance numbers for ResNet20/CIFAR on ZCU104, both emulated and on FPGA:
Strategy
Emulator* cycles (K)
FPGA latency (ms)
Minimum required local memory (kV)
local-isolated
292
5.878
2**
local-vars
236
4.663
8
local-consts
273
4.32
26
local-vars-and-consts
217
3.407
26
(*) Emulator is set to estimate DRAM latency as 1 cycle per vector;
(**) This is bounded by the largest root in the model (conv2d_16 in layer 21);
PR includes following smaller changes:
Fix to find true roots. Previously intermediate save nodes where treated as roots, which occasionally emitted redundant computation, which did not have significant effect on overall performance;
Split Scheduler into IsolatedLocalScheduler and SharedLocalScheduler. The former is capturing current scheduling behavior with splitting computation to multiple stages and partitions. The latter is simplified to produce single stage and partition while sharing local memory space between layers;
Introduce more decoupling between compiler components. Frontends now only depend on the HIR and the MemoryManager, where previously they additionally depended on the Scheduler and the Backend;
Introduce NilHIR and externalize memory allocators and spaces from MemoryManager to allow for two-pass compilation;
Clear TODOs for memory management of pinned objects to use MemoryManager.ReservedConsumers. Fix corresponding memory leaks and require empty allocator at the end of compilation;
Emulator to support --local-consts command line option. When set to true the emulator will load consts into local memory instead of DRAM1;
Introduce FrontendGraphPrinter as an option in EmitContext to allow for two-pass compilation.
This PR introduces support for selecting compiler strategies between current (default) and three new strategies using
--strategy
command line option.local-isolated
is current and default strategy that assumes local memory is not shared between layers;local-vars
new strategy that forces using local memory to share vars between layers;local-consts
new strategy that forces using local memory to share consts assuming they are preloaded before program run;local-vars-and-consts
new strategy that combineslocal-vars
withlocal-consts
.Following are performance numbers for ResNet20/CIFAR on ZCU104, both emulated and on FPGA:
local-isolated
local-vars
local-consts
local-vars-and-consts
(*) Emulator is set to estimate DRAM latency as 1 cycle per vector; (**) This is bounded by the largest root in the model (
conv2d_16
in layer 21);PR includes following smaller changes:
Scheduler
intoIsolatedLocalScheduler
andSharedLocalScheduler
. The former is capturing current scheduling behavior with splitting computation to multiple stages and partitions. The latter is simplified to produce single stage and partition while sharing local memory space between layers;HIR
and theMemoryManager
, where previously they additionally depended on theScheduler
and theBackend
;NilHIR
and externalize memory allocators and spaces fromMemoryManager
to allow for two-pass compilation;MemoryManager.ReservedConsumers
. Fix corresponding memory leaks and require empty allocator at the end of compilation;--local-consts
command line option. When set totrue
the emulator will load consts into local memory instead of DRAM1;FrontendGraphPrinter
as an option inEmitContext
to allow for two-pass compilation.