tensil-ai / tensil

Open source machine learning accelerators
https://www.tensil.ai
Other
348 stars 28 forks source link

Report accurate local and accumulator usage #79

Closed petrohi closed 1 year ago

petrohi commented 1 year ago

This PR introduces accurate reporting for local memory and accumulators usage. Previously upper-boundary estimate was reported. The compiler summary reports usage maximum and aggregate sizes across all of the layers. Aggregate is the total size of all objects allocated and maximum is a high-water mark. Layers summary reports utilization percentage per layer, which is based on the ratio between the maximum memory usage and its architectural depth.

In addition, the compiler summary now includes the maximum number of stages and partitions across all of the layers. When both are equal to one the user may consider using strategies other than the default (local-isolated).

Following are usage values for various memories and compiler strategies for ResNet20/CIFAR on ZCU104.

Maximum

  DRAM0 DRAM1 Local Accumulators
local-vars-and-consts 1,024 0 25,971 8,196
local-vars 1,024 18,803 7,238 8,196
local-consts 7,168 0 25,971 8,196
local-isolated 7,168 18,803 7,238* 8,196*

(*) These maximums are for one stage with one partition case. The local-isolated strategy can split stages and partition for smaller memories.

Aggregate

  DRAM0 DRAM1 Local Accumulators
local-vars-and-consts 1,024 0 47,996 54,851
local-vars 1,024 18,803 48,010 54,851
local-consts 29,193 0 75,652 54,851
local-isolated 29,193 18,803 75,666 54,851
shortcut-integration[bot] commented 1 year ago

This pull request has been linked to Shortcut Story #476: Report accurate local and accumulator usage.