Move precision and SIMD vector types from DNN and BLAS libraries to the Snitch runtime.
Rename snrt_l1alloc and snrt_l3alloc to snrt_l1_alloc and snrt_l3_alloc respectively, in line with all other alloc functions.
Replace size field with end in snrt_allocator_t, for faster bound checks.
Add alloc_v2 functions: allocator structs are core-local, not shared for faster access. This imposes the constraint that all cores update their pointers, to keep them aligned (trades off additional computation for performance). To simplify this, we provide convenience functions snrt_l1_alloc_cluster_local, snrt_l1_alloc_compute_core_local which need to be called by every core.
Split snrt_global_barrier into snrt_inter_cluster_barrier plus cluster barrier.
Add snrt_cluster_is_last_compute_core function as may be used to handle remainder iterations on the last compute core.
Move reusable functions in start.c to start.h for proper inlining. Move all defines to *_start.h and include this in *_start.c to ensure consistent definitions.
snrt_l1alloc
andsnrt_l3alloc
tosnrt_l1_alloc
andsnrt_l3_alloc
respectively, in line with all other alloc functions.size
field withend
insnrt_allocator_t
, for faster bound checks.alloc_v2
functions: allocator structs are core-local, not shared for faster access. This imposes the constraint that all cores update their pointers, to keep them aligned (trades off additional computation for performance). To simplify this, we provide convenience functionssnrt_l1_alloc_cluster_local
,snrt_l1_alloc_compute_core_local
which need to be called by every core.snrt_global_barrier
intosnrt_inter_cluster_barrier
plus cluster barrier.snrt_cluster_is_last_compute_core
function as may be used to handle remainder iterations on the last compute core.start.c
tostart.h
for proper inlining. Move all defines to*_start.h
and include this in*_start.c
to ensure consistent definitions.