Closed yzh119 closed 1 year ago
This PR fixes issue #90 , which is because our CompactBufferRegion pass cannot infer the extent of C_local when there are non-affine expressions:
CompactBufferRegion
C_local
allocate(C_local_8: Pointer(local float32), float32, [(min(1710902, ((max(I_1_3_indices_data_1[((i_1_3_1: int32*8) + threadIdx.y_3: int32)], I_1_3_indices_data_1[0]) + 1) - min(I_1_3_indices_data_1[((i_1_3_1*8) + threadIdx.y_3)], I_1_3_indices_data_1[0])))*2)]), storage_scope = local;
which should be:
allocate(C_local_8: Pointer(local float32), float32, [2]), storage_scope = local;
This PR uses reverse_cache_read (we are upstreaming its generalized form reindex_cache_read/write, see https://github.com/apache/tvm/pull/14161) whose generated buffer does not rely on CompactBufferRegion to determine its extent.
reverse_cache_read
reindex_cache_read/write
The Bug
This PR fixes issue #90 , which is because our
CompactBufferRegion
pass cannot infer the extent ofC_local
when there are non-affine expressions:which should be:
Solution
This PR uses
reverse_cache_read
(we are upstreaming its generalized formreindex_cache_read/write
, see https://github.com/apache/tvm/pull/14161) whose generated buffer does not rely onCompactBufferRegion
to determine its extent.