Hzfengsy commented 3 years ago

Support all kinds of nodes
- [ ] WhileNode
- [ ] BufferRealizeNode
- [ ] ProducerLoadNode
- [ ] ProducerStoreNode
- [ ] ProducerRealizeNode
- [ ] BlockNode (without BlockRealize)
- [ ] AnyNode
Support fragment printing

junrushao commented 3 years ago

Namespace and Tooling-Friendiness

This subsection is based on @yzh119's proposal #420 #426.

Pain points

P1. No python auto-completion support
P2. Usually conflicts with pylint
P3. APIs scatter in namespaces like tvm.script, tvm.tir, tvm.script.ty
P4. Somewhat non-trivial to understand at first glance what the decorator generates

Here is an example of how my pylint complaints about things above:

Proposal

A1. Use tvm.script as the “root” namespace for all TVM script related stuff
A2. Use tvm.script.tir for TIR, and idiomatically import it as T, like Keras is usually imported as K
A3. Use tvm.script.relax for Relax, and idiomatically import it as R
A4. To be consistent with the names of their resulting types, use
- tvm.script.IRModule for IRModule
- T.PrimFunc for tir.PrimFunc
- R.Function for relax.Function

With the proposal above, we are able to provide type stubs that provides users with TVM scripts that work well with linting and auto-completion.

Here is an example of the proposed syntax:

from tvm.script import tir as T                                
# ^ there is a broadly accepted precedence in doing this in the python community: from keras import backend as K

@tvm.script.IRModule                                                   # so it generates an IRModule
class Module:
  @T.PrimFunc                                                          # it generates a PrimFunc
  def func(a: T.handle, b: T.handle, C: T.handle) -> None:
    A = T.match_buffer(a, [128, 128], dtype="float32")                 # stub provided for tvm.script.tir.match_buffer
    B = T.match_buffer(b, [128, 128], dtype="float32")
    C = T.match_buffer(c, [128, 128], dtype="float32")
    with T.block([128, 128, T.reduce_axis(0, 128)], "C") as [i, j, k]: # stub provided for tvm.script.tir.block
        C[i, j] = T.if_then_else(                                      # stub provided for tvm.script.tir.if_then_else
            i == 0 and j == 0 and k == 0,
            0.0,
            C[i, j] + A[i, k] * B[k, j],
            dtype="float32",
        )

>>> print(type(Module))
<class 'tvm.ir.module.IRModule'>

>>> print(type(Module["func"]))
<class 'tvm.tir.function.PrimFunc'>

junrushao commented 3 years ago

Block and block bindings: Proposal B0

Pain points

B1. Trivial bindings
B2. Block's iter domain duplicates with outer loops' loop domain
B3. Auto-complete is "too" automatic

Proposal

Here is the philosophy behind the proposed design

F1. Focus on the concept "iteration domain" of a block
F2. Minimize repetitive declaration for any trivial bindings
F3. Reduce the line width to improve readability

G1. The complete form

for i, j, k in T.grid(512, 512, 512):
  with T.block("C", iter_dom_ndim=3) as [vi, vj, vk]:
    T.iter_dom_dim(var=vi, type='S', dom=512, bind=i)
    T.iter_dom_dim(var=vj, type='S', dom=512, bind=j)
    T.iter_dom_dim(var=vk, type='R', dom=512, bind=k)
    T.reads(...)
    T.writes(...)

G2. With full trivial bindings

for i, j, k in T.grid(512, 512, 512):
  with T.block("C", iter_dom_ndim=3, trivial_bind="SSR") as [i, j, k]: # <= redefinition treated as binding
    T.reads(...)
    T.writes(...)

G3. With partial trivial bindings

for i, j, k in T.grid(512, 512, 512):
  with T.block("C", iter_dom_ndim=3, trivial_bind=".SR") as [ki, j, k]:
    T.iter_dom_dim(var=vi, type='S', dom=512, bind=i)
    T.reads(...)
    T.writes(...)

G4. No automatic loop induction

Generating loops on top of blocks looks a bit weird in terms of semantics, even though totally conveyable with extra documentation. With our binding design, we don't actually need this powerful tool.

junrushao commented 3 years ago

410

tqchen commented 3 years ago

It would be great to discuss a few candidates of blocks and block bindings. I labeled @junrushao1994 's proposal as B0, let us also list the current definition and new proposals, so we can have a clear set of basis for discussion.

tqchen commented 3 years ago

Block and block bindings: Proposal B1

Note that this form discards the desire of putting iterators on the block, but instead focuses on getting some information right in the block body.

Complete Form

for i, j, k in T.grid(512, 512, 512):
  with T.block("C"):
    # the API name can subject to change
    vi = T.axis.S(512, i)
    vj = T.axis.S(512, j)
    vk = T.axis.R(512, k)
    T.reads(...)
    T.writes(...)

Note that API name can change

B1a: do not mark axis in the block since we are breaking the assumption that with block is relatively self contained.
B1b: Use block_var = match_axis_pattern(domain, value) to represent the value mapping, this is consistent with our use of match_buffer
B1c: The name naming of the match_axis_pattern can subject to change, there are a few choices here:
- Simply encode type as function name
- Encode type into keyword arguments
- Use a namespace to emphasize the kind (T.axis)

Allow Autobinding some vars

for i, j, k in T.grid(512, 512, 512):
  with T.block("C", map_axis=[i, j, k]):
       C[i, j] += A[i, k] * B[j, k]

Key design pts:

B1d: The block contains a list of iterators that are passed as block_vars, they can be directly used in the body.
Naming can subject to change:
- map_axis: inspired from memmap
- auto_bind: automatically bind iterators

Another alternative(add mapping property declarations )

for i, j, k in T.grid(512, 512, 512):
  with T.block("C", map_spatial_axis=[i, j], map_reduce_axis=[k]):
       C[i, j] += A[i, k] * B[j, k]

Note on advanced constraints

As we extent to future iteration patterns, we might want to introduce additional constraints, where the iterator may no longer fit be declared separately. As a mock up example, we might introduce a concept of axis group to declare the non-trivial interactive relation among three axis, and they need to be declared together. We need to think about our convention to extent to this case

for i, j, k in T.grid(512, 512, 512):
  with T.block("C"):
    vi, vj, vk = T.sparse.axis_group([512, 512, 512], "Dense,Sparse,Dense"
        [value0, value1, value2]
     )
    T.reads(...)
    T.writes(...)

tqchen commented 3 years ago

Block and block bindings: Proposal B2

This is the current form

Complete Form

for i, j, k in T.grid(512, 512, 512):
  with T.block("C", [512, 512, T.reduce_axis(512)]) as vi, vj, vk:
    # the API name can subject to change
    T.bind(vi,  i)
    T.bind(vj,  j)
    T.bind(vk,  k)
    T.reads(...)
    T.writes(...)

Autobinding iis implicit

  with T.block([512, 512, T.reduce_axis(512)], ) as vi, vj, vk:
       C[i, j] += A[i, k] * B[j, k]

Hzfengsy commented 3 years ago

Thanks for the great discussion and proposals. Here are two major points from my opinion.

Let users know there are block vars and bindings
It would be great if there are few lines since one block may have more than 5 block vars in conv2d workload.

Block and block bindings: Proposal B3

Complete Form

for i, j, k in T.grid(512, 512, 512):
    with T.block("C"):
        vi = T.axis.S(i, 512)
        vj = T.axis.S(j, 512)
        vk = T.axis.R(k, 512)
        T.reads(...)
        T.writes(...)

A Sugar for Complete Form

for i, j, k in T.grid(512, 512, 512):
    with T.block("C"):
        vi, vj, vk = T.iter([i, j, k], "SSR")
        T.reads(...)
        T.writes(...)

Auto binding

No needed in this format

junrushao commented 3 years ago

Thanks @tqchen and @Hzfengsy for the proposals!

First of all, we seem to converge to a point where we don't want the with statement to contain all the block information, which can be overwhelming to certain extent: imagine a conv2d with 3 spatial axes and 4 reduction axes, which is unrealistic to put them on a single line without raising confusion.

Block binding

On the syntax of a block binding, I listed the proposal B0, B1 and B3 below for detailed comparison:

# Syntax in B0
T.iter_dom_dim(var=vi, type='S', dom=512, bind=i)
# Syntax in B1
vi = T.axis.S(i, 512)
# Syntax in B3
vi = T.axis.S(512, i)

Both B1 and B3 treats bindings as assignments, which hmmm from my PoV is not a big problem, and looks cleaner (PL guys might disagree). Also, both B1 and B3 seem to use standalone scoping for these bindings, which I feel is better than B0.

The difference between B1 and B3 is order of arguments, which I would prefer B3, which is easier for users to write fragment where a Block can exist without BlockRealize.

One thing I am not so sure about is naming. As @Hzfengsy said, we would love to the syntax itself to convey the design philosophy (Let users know there are block vars and bindings), so I feel strongly that we should emphasize the concept "block domain", or "iteration domain of the block". Therefore we should love to propose the following:

# B4. The new proposal
vi = T.block_domain.S(domain=512, bind=i)

# In the doc, which pops up almost instantly in users' vscode/vim/other IDEs
# we can say this is shortcut for `T.block_domain.spatial_axis`

Auto-binding for Trivial Bindings

Looks like we have 3 different proposals here:

# Syntax in B0
for i, j, k in T.grid(512, 512, 512):
  with T.block("C", iter_dom_ndim=3, trivial_bind=".SR") as [ki, j, k]:  # <= redefinition treated as binding
    T.iter_dom_dim(var=vi, type='S', dom=512, bind=i)
    T.reads(...)
    T.writes(...)

# Syntax in B1
for i, j, k in T.grid(512, 512, 512):
  with T.block("C", map_axis=[i, j, k]):
       C[i, j] += A[i, k] * B[j, k]

# Syntax in B3
for i, j, k in T.grid(512, 512, 512):
    with T.block("C"):
        vi, vj, vk = T.iter([i, j, k], "SSR")
        T.reads(...)
        T.writes(...)

Below are my understanding:

The redefinition-as-trivial-binding semantics on B0 is admittedly sort of confusing and unpythonic;
B1 seems to introduce some interesting semantics which takes me quite a while to understand (specifically "map" in a "block");
B3 is the most natural way from my PoV which doesn't deviate from our previous binding definition.

Therefore, I would love to go with B3, with some minor naming stuff to make sure our definition is always focused on one and only one concept - "block domain". Here is my new proposal that focuses B3 on "block domain" as well as generalize the proposal a little bit:

# B4. The new proposal
for i, j, k in T.grid(512, 512, 512):
    with T.block("C"):
        vi, vj, vk = T.block_domain.many("SSR", [i, j, k])
        T.reads(...)
        T.writes(...)

for i, j, k in T.grid(512, 512, 512):
    with T.block("C"):
        vi, vj = T.block_domain.many(types="SS", binds=[i, j])
        vk = T.block_domain.many(types="R", binds=k + 1)  # <= can write arbitrary expression in binds
        T.reads(...)
        T.writes(...)

tqchen commented 3 years ago

a bit more about naming. We do need to convey the concept axis or iter var in someway.

To explain one possible confusion here.

block_domain.S can be interpreted as one kind of ”domain”, and there are many block domains in a block. While what we really want to say is one iterator in the domain, and all of the iterators form a domain.

Another possible way to highlight block could be(although I am not attached to it)

with block() as b: vi = b.axis.S

Refer to Block name explicitly: Proposal B5

for i, j, k in T.grid(512, 512, 512):
    # block is named as blockC
    with T.block() as blockC:
        vi = blockC.axis.S(512, i)
        vj = blockC.axis.S(512, j)
        vk = blockC.axis.R(512, k)
        blockC.reads(...)
        blockC.writes(...)

for i, j, k in T.grid(512, 512, 512):
    with T.block() as blockC:
        vi, vj, vk = blockC.axis.reuse("SSR", [i, j, k])
        blockC.reads(...)
        blockC.writes(...)

One potential drawback here is that the block name can be confused with the buffer name(if you directly want to name block as C)

junrushao commented 3 years ago

The new with statement looks pretty good to me, thanks for this proposal!

On the naming: what about using “blockC.domain_axis.S” instead of “blockC.axis.S”? Because a block doesn’t have axes, but its iteration domain does

junrushao commented 3 years ago

CC: @zxybazh @shingjan

tqchen commented 3 years ago

The main limitation of B5 is that block name can longer be same with the buffer name(which can be a common requirement), Considering this fact we might still want to bring back the old style but keep name block_axis .

for i, j, k in T.grid(512, 512, 512):
    # block is named as blockC
    with T.block("C"):
        vi = T.block_axis.S(512, i)
        vj = T.block_axis.S(512, j)
        vk = T.block_axis.R(512, k)
        T.reads(...)
        T.writes(...)

for i, j, k in T.grid(512, 512, 512):
    with T.block():
        vi, vj, vk = T.block_axis.reuse("SSR", [i, j, k])
        T.reads(...)
        T.writes(...)

tlc-pack / tvm-tensorir

[Roadmap] TVMScript Frontend #471

Namespace and Tooling-Friendiness

Pain points

Proposal

Block and block bindings: Proposal B0

Pain points

Proposal

410

Block and block bindings: Proposal B1

Complete Form

Allow Autobinding some vars

Note on advanced constraints

Block and block bindings: Proposal B2

Complete Form

Autobinding iis implicit

Block and block bindings: Proposal B3

Complete Form

A Sugar for Complete Form

Auto binding

Block binding

Auto-binding for Trivial Bindings

Refer to Block name explicitly: Proposal B5