Currently our cache read/write primitive would not analyze read/region region recursively, this means when we cache read A in the following block:
with T.block('blk'):
T.read(A[indptr[vi], 0:128], indptr[vi])
...
Cache read instructions would only generate
with T.block('A_shared'):
T.read(A[indptr[vi, 0:128])
T.write(A_shared[indptr[vi: 0:128])
...
with T.block('blk'):
T.read(A_shared[indptr[vi], 0:128], indptr[vi])
...
and indptr[vi] would not appear in the cache read block, this is problematic if we want to further cache read indptr array for A_shared block (a common optimization in sparse kernel libraries).
Solution
Add an extra arguments recursive_analysis for (reverse)cache read/write instructions to enable recursive read/write region analysis, this extra argument should default to False so that current behavior would not be influenced.
Background
Currently our cache read/write primitive would not analyze read/region region recursively, this means when we cache read
A
in the following block:Cache read instructions would only generate
and
indptr[vi]
would not appear in the cache read block, this is problematic if we want to further cache readindptr
array forA_shared
block (a common optimization in sparse kernel libraries).Solution
Add an extra arguments
recursive_analysis
for (reverse)cache read/write instructions to enable recursive read/write region analysis, this extra argument should default to False so that current behavior would not be influenced.