Open Hzfengsy opened 3 years ago
This subsection is based on @yzh119's proposal #420 #426.
Here is an example of how my pylint complaints about things above:
tvm.script
as the “root” namespace for all TVM script related stufftvm.script.tir
for TIR, and idiomatically import it as T, like Keras is usually imported as Ktvm.script.relax
for Relax, and idiomatically import it as Rtvm.script.IRModule
for IRModule
T.PrimFunc
for tir.PrimFunc
R.Function
for relax.Function
With the proposal above, we are able to provide type stubs that provides users with TVM scripts that work well with linting and auto-completion.
Here is an example of the proposed syntax:
from tvm.script import tir as T
# ^ there is a broadly accepted precedence in doing this in the python community: from keras import backend as K
@tvm.script.IRModule # so it generates an IRModule
class Module:
@T.PrimFunc # it generates a PrimFunc
def func(a: T.handle, b: T.handle, C: T.handle) -> None:
A = T.match_buffer(a, [128, 128], dtype="float32") # stub provided for tvm.script.tir.match_buffer
B = T.match_buffer(b, [128, 128], dtype="float32")
C = T.match_buffer(c, [128, 128], dtype="float32")
with T.block([128, 128, T.reduce_axis(0, 128)], "C") as [i, j, k]: # stub provided for tvm.script.tir.block
C[i, j] = T.if_then_else( # stub provided for tvm.script.tir.if_then_else
i == 0 and j == 0 and k == 0,
0.0,
C[i, j] + A[i, k] * B[k, j],
dtype="float32",
)
>>> print(type(Module))
<class 'tvm.ir.module.IRModule'>
>>> print(type(Module["func"]))
<class 'tvm.tir.function.PrimFunc'>
Here is the philosophy behind the proposed design
G1. The complete form
for i, j, k in T.grid(512, 512, 512):
with T.block("C", iter_dom_ndim=3) as [vi, vj, vk]:
T.iter_dom_dim(var=vi, type='S', dom=512, bind=i)
T.iter_dom_dim(var=vj, type='S', dom=512, bind=j)
T.iter_dom_dim(var=vk, type='R', dom=512, bind=k)
T.reads(...)
T.writes(...)
G2. With full trivial bindings
for i, j, k in T.grid(512, 512, 512):
with T.block("C", iter_dom_ndim=3, trivial_bind="SSR") as [i, j, k]: # <= redefinition treated as binding
T.reads(...)
T.writes(...)
G3. With partial trivial bindings
for i, j, k in T.grid(512, 512, 512):
with T.block("C", iter_dom_ndim=3, trivial_bind=".SR") as [ki, j, k]:
T.iter_dom_dim(var=vi, type='S', dom=512, bind=i)
T.reads(...)
T.writes(...)
G4. No automatic loop induction
Generating loops on top of blocks looks a bit weird in terms of semantics, even though totally conveyable with extra documentation. With our binding design, we don't actually need this powerful tool.
It would be great to discuss a few candidates of blocks and block bindings. I labeled @junrushao1994 's proposal as B0, let us also list the current definition and new proposals, so we can have a clear set of basis for discussion.
Note that this form discards the desire of putting iterators on the block, but instead focuses on getting some information right in the block body.
for i, j, k in T.grid(512, 512, 512):
with T.block("C"):
# the API name can subject to change
vi = T.axis.S(512, i)
vj = T.axis.S(512, j)
vk = T.axis.R(512, k)
T.reads(...)
T.writes(...)
Note that API name can change
block_var = match_axis_pattern(domain, value)
to represent the value mapping, this is consistent with our use of match_buffermatch_axis_pattern
can subject to change, there are a few choices here:
for i, j, k in T.grid(512, 512, 512):
with T.block("C", map_axis=[i, j, k]):
C[i, j] += A[i, k] * B[j, k]
Key design pts:
Another alternative(add mapping property declarations )
for i, j, k in T.grid(512, 512, 512):
with T.block("C", map_spatial_axis=[i, j], map_reduce_axis=[k]):
C[i, j] += A[i, k] * B[j, k]
As we extent to future iteration patterns, we might want to introduce additional constraints, where the iterator may no longer fit be declared separately. As a mock up example, we might introduce a concept of axis group to declare the non-trivial interactive relation among three axis, and they need to be declared together. We need to think about our convention to extent to this case
for i, j, k in T.grid(512, 512, 512):
with T.block("C"):
vi, vj, vk = T.sparse.axis_group([512, 512, 512], "Dense,Sparse,Dense"
[value0, value1, value2]
)
T.reads(...)
T.writes(...)
This is the current form
for i, j, k in T.grid(512, 512, 512):
with T.block("C", [512, 512, T.reduce_axis(512)]) as vi, vj, vk:
# the API name can subject to change
T.bind(vi, i)
T.bind(vj, j)
T.bind(vk, k)
T.reads(...)
T.writes(...)
with T.block([512, 512, T.reduce_axis(512)], ) as vi, vj, vk:
C[i, j] += A[i, k] * B[j, k]
Thanks for the great discussion and proposals. Here are two major points from my opinion.
for i, j, k in T.grid(512, 512, 512):
with T.block("C"):
vi = T.axis.S(i, 512)
vj = T.axis.S(j, 512)
vk = T.axis.R(k, 512)
T.reads(...)
T.writes(...)
for i, j, k in T.grid(512, 512, 512):
with T.block("C"):
vi, vj, vk = T.iter([i, j, k], "SSR")
T.reads(...)
T.writes(...)
No needed in this format
Thanks @tqchen and @Hzfengsy for the proposals!
First of all, we seem to converge to a point where we don't want the with statement to contain all the block information, which can be overwhelming to certain extent: imagine a conv2d with 3 spatial axes and 4 reduction axes, which is unrealistic to put them on a single line without raising confusion.
On the syntax of a block binding, I listed the proposal B0, B1 and B3 below for detailed comparison:
# Syntax in B0
T.iter_dom_dim(var=vi, type='S', dom=512, bind=i)
# Syntax in B1
vi = T.axis.S(i, 512)
# Syntax in B3
vi = T.axis.S(512, i)
Both B1 and B3 treats bindings as assignments, which hmmm from my PoV is not a big problem, and looks cleaner (PL guys might disagree). Also, both B1 and B3 seem to use standalone scoping for these bindings, which I feel is better than B0.
The difference between B1 and B3 is order of arguments, which I would prefer B3, which is easier for users to write fragment where a Block can exist without BlockRealize.
One thing I am not so sure about is naming. As @Hzfengsy said, we would love to the syntax itself to convey the design philosophy (Let users know there are block vars and bindings), so I feel strongly that we should emphasize the concept "block domain", or "iteration domain of the block". Therefore we should love to propose the following:
# B4. The new proposal
vi = T.block_domain.S(domain=512, bind=i)
# In the doc, which pops up almost instantly in users' vscode/vim/other IDEs
# we can say this is shortcut for `T.block_domain.spatial_axis`
Looks like we have 3 different proposals here:
# Syntax in B0
for i, j, k in T.grid(512, 512, 512):
with T.block("C", iter_dom_ndim=3, trivial_bind=".SR") as [ki, j, k]: # <= redefinition treated as binding
T.iter_dom_dim(var=vi, type='S', dom=512, bind=i)
T.reads(...)
T.writes(...)
# Syntax in B1
for i, j, k in T.grid(512, 512, 512):
with T.block("C", map_axis=[i, j, k]):
C[i, j] += A[i, k] * B[j, k]
# Syntax in B3
for i, j, k in T.grid(512, 512, 512):
with T.block("C"):
vi, vj, vk = T.iter([i, j, k], "SSR")
T.reads(...)
T.writes(...)
Below are my understanding:
Therefore, I would love to go with B3, with some minor naming stuff to make sure our definition is always focused on one and only one concept - "block domain". Here is my new proposal that focuses B3 on "block domain" as well as generalize the proposal a little bit:
# B4. The new proposal
for i, j, k in T.grid(512, 512, 512):
with T.block("C"):
vi, vj, vk = T.block_domain.many("SSR", [i, j, k])
T.reads(...)
T.writes(...)
for i, j, k in T.grid(512, 512, 512):
with T.block("C"):
vi, vj = T.block_domain.many(types="SS", binds=[i, j])
vk = T.block_domain.many(types="R", binds=k + 1) # <= can write arbitrary expression in binds
T.reads(...)
T.writes(...)
a bit more about naming. We do need to convey the concept axis or iter var in someway.
To explain one possible confusion here.
block_domain.S can be interpreted as one kind of ”domain”, and there are many block domains in a block. While what we really want to say is one iterator in the domain, and all of the iterators form a domain.
Another possible way to highlight block could be(although I am not attached to it)
for i, j, k in T.grid(512, 512, 512):
# block is named as blockC
with T.block() as blockC:
vi = blockC.axis.S(512, i)
vj = blockC.axis.S(512, j)
vk = blockC.axis.R(512, k)
blockC.reads(...)
blockC.writes(...)
for i, j, k in T.grid(512, 512, 512):
with T.block() as blockC:
vi, vj, vk = blockC.axis.reuse("SSR", [i, j, k])
blockC.reads(...)
blockC.writes(...)
One potential drawback here is that the block name can be confused with the buffer name(if you directly want to name block as C)
The new with statement looks pretty good to me, thanks for this proposal!
On the naming: what about using “blockC.domain_axis.S” instead of “blockC.axis.S”? Because a block doesn’t have axes, but its iteration domain does
CC: @zxybazh @shingjan
The main limitation of B5 is that block name can longer be same with the buffer name(which can be a common requirement), Considering this fact we might still want to bring back the old style but keep name block_axis
.
for i, j, k in T.grid(512, 512, 512):
# block is named as blockC
with T.block("C"):
vi = T.block_axis.S(512, i)
vj = T.block_axis.S(512, j)
vk = T.block_axis.R(512, k)
T.reads(...)
T.writes(...)
for i, j, k in T.grid(512, 512, 512):
with T.block():
vi, vj, vk = T.block_axis.reuse("SSR", [i, j, k])
T.reads(...)
T.writes(...)