tlc-pack / tvm-tensorir

Apache License 2.0
8 stars 0 forks source link

[Roadmap] TVMScript Frontend #471

Open Hzfengsy opened 3 years ago

Hzfengsy commented 3 years ago
  1. Support all kinds of nodes
    • [ ] WhileNode
    • [ ] BufferRealizeNode
    • [ ] ProducerLoadNode
    • [ ] ProducerStoreNode
    • [ ] ProducerRealizeNode
    • [ ] BlockNode (without BlockRealize)
    • [ ] AnyNode
  2. Support fragment printing
junrushao commented 3 years ago

Namespace and Tooling-Friendiness

This subsection is based on @yzh119's proposal #420 #426.

Pain points

Here is an example of how my pylint complaints about things above: image

Proposal

With the proposal above, we are able to provide type stubs that provides users with TVM scripts that work well with linting and auto-completion.

Here is an example of the proposed syntax:

from tvm.script import tir as T                                
# ^ there is a broadly accepted precedence in doing this in the python community: from keras import backend as K

@tvm.script.IRModule                                                   # so it generates an IRModule
class Module:
  @T.PrimFunc                                                          # it generates a PrimFunc
  def func(a: T.handle, b: T.handle, C: T.handle) -> None:
    A = T.match_buffer(a, [128, 128], dtype="float32")                 # stub provided for tvm.script.tir.match_buffer
    B = T.match_buffer(b, [128, 128], dtype="float32")
    C = T.match_buffer(c, [128, 128], dtype="float32")
    with T.block([128, 128, T.reduce_axis(0, 128)], "C") as [i, j, k]: # stub provided for tvm.script.tir.block
        C[i, j] = T.if_then_else(                                      # stub provided for tvm.script.tir.if_then_else
            i == 0 and j == 0 and k == 0,
            0.0,
            C[i, j] + A[i, k] * B[k, j],
            dtype="float32",
        )

>>> print(type(Module))
<class 'tvm.ir.module.IRModule'>

>>> print(type(Module["func"]))
<class 'tvm.tir.function.PrimFunc'>
junrushao commented 3 years ago

Block and block bindings: Proposal B0

Pain points

Proposal

Here is the philosophy behind the proposed design

G1. The complete form

for i, j, k in T.grid(512, 512, 512):
  with T.block("C", iter_dom_ndim=3) as [vi, vj, vk]:
    T.iter_dom_dim(var=vi, type='S', dom=512, bind=i)
    T.iter_dom_dim(var=vj, type='S', dom=512, bind=j)
    T.iter_dom_dim(var=vk, type='R', dom=512, bind=k)
    T.reads(...)
    T.writes(...)

G2. With full trivial bindings

for i, j, k in T.grid(512, 512, 512):
  with T.block("C", iter_dom_ndim=3, trivial_bind="SSR") as [i, j, k]: # <= redefinition treated as binding
    T.reads(...)
    T.writes(...)

G3. With partial trivial bindings

for i, j, k in T.grid(512, 512, 512):
  with T.block("C", iter_dom_ndim=3, trivial_bind=".SR") as [ki, j, k]:
    T.iter_dom_dim(var=vi, type='S', dom=512, bind=i)
    T.reads(...)
    T.writes(...)

G4. No automatic loop induction

Generating loops on top of blocks looks a bit weird in terms of semantics, even though totally conveyable with extra documentation. With our binding design, we don't actually need this powerful tool.

junrushao commented 3 years ago

410

tqchen commented 3 years ago

It would be great to discuss a few candidates of blocks and block bindings. I labeled @junrushao1994 's proposal as B0, let us also list the current definition and new proposals, so we can have a clear set of basis for discussion.

tqchen commented 3 years ago

Block and block bindings: Proposal B1

Note that this form discards the desire of putting iterators on the block, but instead focuses on getting some information right in the block body.

Complete Form

for i, j, k in T.grid(512, 512, 512):
  with T.block("C"):
    # the API name can subject to change
    vi = T.axis.S(512, i)
    vj = T.axis.S(512, j)
    vk = T.axis.R(512, k)
    T.reads(...)
    T.writes(...)

Note that API name can change

Allow Autobinding some vars

for i, j, k in T.grid(512, 512, 512):
  with T.block("C", map_axis=[i, j, k]):
       C[i, j] += A[i, k] * B[j, k]

Key design pts:

Another alternative(add mapping property declarations )

for i, j, k in T.grid(512, 512, 512):
  with T.block("C", map_spatial_axis=[i, j], map_reduce_axis=[k]):
       C[i, j] += A[i, k] * B[j, k]

Note on advanced constraints

As we extent to future iteration patterns, we might want to introduce additional constraints, where the iterator may no longer fit be declared separately. As a mock up example, we might introduce a concept of axis group to declare the non-trivial interactive relation among three axis, and they need to be declared together. We need to think about our convention to extent to this case

for i, j, k in T.grid(512, 512, 512):
  with T.block("C"):
    vi, vj, vk = T.sparse.axis_group([512, 512, 512], "Dense,Sparse,Dense"
        [value0, value1, value2]
     )
    T.reads(...)
    T.writes(...)
tqchen commented 3 years ago

Block and block bindings: Proposal B2

This is the current form

Complete Form

for i, j, k in T.grid(512, 512, 512):
  with T.block("C", [512, 512, T.reduce_axis(512)]) as vi, vj, vk:
    # the API name can subject to change
    T.bind(vi,  i)
    T.bind(vj,  j)
    T.bind(vk,  k)
    T.reads(...)
    T.writes(...)

Autobinding iis implicit

  with T.block([512, 512, T.reduce_axis(512)], ) as vi, vj, vk:
       C[i, j] += A[i, k] * B[j, k]
Hzfengsy commented 3 years ago

Thanks for the great discussion and proposals. Here are two major points from my opinion.

  1. Let users know there are block vars and bindings
  2. It would be great if there are few lines since one block may have more than 5 block vars in conv2d workload.

Block and block bindings: Proposal B3

Complete Form

for i, j, k in T.grid(512, 512, 512):
    with T.block("C"):
        vi = T.axis.S(i, 512)
        vj = T.axis.S(j, 512)
        vk = T.axis.R(k, 512)
        T.reads(...)
        T.writes(...)

A Sugar for Complete Form

for i, j, k in T.grid(512, 512, 512):
    with T.block("C"):
        vi, vj, vk = T.iter([i, j, k], "SSR")
        T.reads(...)
        T.writes(...)

Auto binding

No needed in this format

junrushao commented 3 years ago

Thanks @tqchen and @Hzfengsy for the proposals!

First of all, we seem to converge to a point where we don't want the with statement to contain all the block information, which can be overwhelming to certain extent: imagine a conv2d with 3 spatial axes and 4 reduction axes, which is unrealistic to put them on a single line without raising confusion.

Block binding

On the syntax of a block binding, I listed the proposal B0, B1 and B3 below for detailed comparison:

# Syntax in B0
T.iter_dom_dim(var=vi, type='S', dom=512, bind=i)
# Syntax in B1
vi = T.axis.S(i, 512)
# Syntax in B3
vi = T.axis.S(512, i)

Both B1 and B3 treats bindings as assignments, which hmmm from my PoV is not a big problem, and looks cleaner (PL guys might disagree). Also, both B1 and B3 seem to use standalone scoping for these bindings, which I feel is better than B0.

The difference between B1 and B3 is order of arguments, which I would prefer B3, which is easier for users to write fragment where a Block can exist without BlockRealize.

One thing I am not so sure about is naming. As @Hzfengsy said, we would love to the syntax itself to convey the design philosophy (Let users know there are block vars and bindings), so I feel strongly that we should emphasize the concept "block domain", or "iteration domain of the block". Therefore we should love to propose the following:

# B4. The new proposal
vi = T.block_domain.S(domain=512, bind=i)

# In the doc, which pops up almost instantly in users' vscode/vim/other IDEs
# we can say this is shortcut for `T.block_domain.spatial_axis`

Auto-binding for Trivial Bindings

Looks like we have 3 different proposals here:

# Syntax in B0
for i, j, k in T.grid(512, 512, 512):
  with T.block("C", iter_dom_ndim=3, trivial_bind=".SR") as [ki, j, k]:  # <= redefinition treated as binding
    T.iter_dom_dim(var=vi, type='S', dom=512, bind=i)
    T.reads(...)
    T.writes(...)

# Syntax in B1
for i, j, k in T.grid(512, 512, 512):
  with T.block("C", map_axis=[i, j, k]):
       C[i, j] += A[i, k] * B[j, k]

# Syntax in B3
for i, j, k in T.grid(512, 512, 512):
    with T.block("C"):
        vi, vj, vk = T.iter([i, j, k], "SSR")
        T.reads(...)
        T.writes(...)

Below are my understanding:

Therefore, I would love to go with B3, with some minor naming stuff to make sure our definition is always focused on one and only one concept - "block domain". Here is my new proposal that focuses B3 on "block domain" as well as generalize the proposal a little bit:

# B4. The new proposal
for i, j, k in T.grid(512, 512, 512):
    with T.block("C"):
        vi, vj, vk = T.block_domain.many("SSR", [i, j, k])
        T.reads(...)
        T.writes(...)

for i, j, k in T.grid(512, 512, 512):
    with T.block("C"):
        vi, vj = T.block_domain.many(types="SS", binds=[i, j])
        vk = T.block_domain.many(types="R", binds=k + 1)  # <= can write arbitrary expression in binds
        T.reads(...)
        T.writes(...)
tqchen commented 3 years ago

a bit more about naming. We do need to convey the concept axis or iter var in someway.

To explain one possible confusion here.

block_domain.S can be interpreted as one kind of ”domain”, and there are many block domains in a block. While what we really want to say is one iterator in the domain, and all of the iterators form a domain.

Another possible way to highlight block could be(although I am not attached to it)

Refer to Block name explicitly: Proposal B5

for i, j, k in T.grid(512, 512, 512):
    # block is named as blockC
    with T.block() as blockC:
        vi = blockC.axis.S(512, i)
        vj = blockC.axis.S(512, j)
        vk = blockC.axis.R(512, k)
        blockC.reads(...)
        blockC.writes(...)

for i, j, k in T.grid(512, 512, 512):
    with T.block() as blockC:
        vi, vj, vk = blockC.axis.reuse("SSR", [i, j, k])
        blockC.reads(...)
        blockC.writes(...)

One potential drawback here is that the block name can be confused with the buffer name(if you directly want to name block as C)

junrushao commented 3 years ago

The new with statement looks pretty good to me, thanks for this proposal!

On the naming: what about using “blockC.domain_axis.S” instead of “blockC.axis.S”? Because a block doesn’t have axes, but its iteration domain does

junrushao commented 3 years ago

CC: @zxybazh @shingjan

tqchen commented 3 years ago

The main limitation of B5 is that block name can longer be same with the buffer name(which can be a common requirement), Considering this fact we might still want to bring back the old style but keep name block_axis .

for i, j, k in T.grid(512, 512, 512):
    # block is named as blockC
    with T.block("C"):
        vi = T.block_axis.S(512, i)
        vj = T.block_axis.S(512, j)
        vk = T.block_axis.R(512, k)
        T.reads(...)
        T.writes(...)

for i, j, k in T.grid(512, 512, 512):
    with T.block():
        vi, vj, vk = T.block_axis.reuse("SSR", [i, j, k])
        T.reads(...)
        T.writes(...)