Currently, the blockize primitive's default behavior is to place the init block in the outside block when we try to blockize a reduction block. However, this might not be the desired behavior when the block iter vars in the outside block are all data parallel:
In this case, the inner block is neither a local complete block nor a local reduction block, and we can not bind the loop surrounding the inner block to any physical threads.
Proposal
Add an extra argument inner_init which defaults to False.
When set to True, blockize would check whether the outer loops are used in any reduction iter vars, if not, we create a new block starting from the given loop, and place init block inside the created inner block:
Problem with current design
Currently, the
blockize
primitive's default behavior is to place theinit
block in the outside block when we try to blockize a reduction block. However, this might not be the desired behavior when the block iter vars in the outside block are all data parallel:In this case, the inner block is neither a local complete block nor a local reduction block, and we can not bind the loop surrounding the inner block to any physical threads.
Proposal
Add an extra argument
inner_init
which defaults toFalse
. When set toTrue
,blockize
would check whether the outer loops are used in any reduction iter vars, if not, we create a new block starting from the given loop, and place init block inside the created inner block: