Closed janboeye closed 6 years ago
No.
I transformed this constant matrix by a utility function const_array
and unrolled all the transform matrix multiplication. Then TVM's simplify pass will remove zero elements. You can print lowered ir to confirm.
I posted the performance comparison in https://github.com/dmlc/tvm/pull/898
@merrymercy
Thanks for explanation. Could you explain that how const_array could help to remove zero?
produce G {
for (i, 0, 4) {
for (j, 0, 3) {
G[((i*3) + j)] = select(((i == 3) && (j == 2)), 1.000000f, select((((i == 3) && (j == 1)) || ((i == 3) && (j == 0))), 0.000000f, select(((i == 2) && (j == 2)), 0.500000f, select(((i == 2) && (j == 1)), -0.500000f, select((((i == 2) && (j == 0)) || (((i == 1) && (j == 2)) || (((i == 1) && (j == 1)) || ((i == 1) && (j == 0))))), 0.500000f, select((((i == 0) && (j == 2)) || (((i == 0) && (j == 1)) || !((i == 0) && (j == 0)))), 0.000000f, 1.000000f))))))
}
}
}
I get the IR, but do not understand how TVM could remove all zero in this select statement.
Thanks
You should lower the whole kernel transform
https://github.com/dmlc/tvm/blob/5d53f0f9ecb490245f8dba542437b5b70b7ba87d/topi/python/topi/mali/conv2d.py#L566-L571
and unroll axes eps, nu, r_kh, r_kw
. Then these select
expression will be simplified
@merrymercy
I write lower code like following
# transform kernel
s[G].compute_inline()
eps, nu, k, c, kk, = s[U].op.axis
r_kh, r_kw = s[U].op.reduce_axis
s[U].reorder(k, c, kk, eps, nu, r_kh, r_kw)
_ = [s[U].unroll(x) for x in [eps, nu, r_kh, r_kw]]
print "transform kernel lower"
su = tvm.create_schedule(s[U].op)
print(tvm.lower(su, [kernel, G], simple_mode=True))
but I got following IR:
produce G {
for (i, 0, 4) {
for (j, 0, 3) {
G[((i*3) + j)] = select(((i == 3) && (j == 2)), 1.000000f, select((((i == 3) && (j == 1)) || ((i == 3) && (j == 0))), 0.000000f, select(((i == 2) && (j == 2)), 0.500000f, select(((i == 2) && (j == 1)), -0.500000f, select((((i == 2) && (j == 0)) || (((i == 1) && (j == 2)) || (((i == 1) && (j == 1)) || ((i == 1) && (j == 0))))), 0.500000f, select((((i == 0) && (j == 2)) || (((i == 0) && (j == 1)) || !((i == 0) && (j == 0)))), 0.000000f, 1.000000f))))))
}
}
}
produce U {
for (eps, 0, 4) {
for (nu, 0, 4) {
for (k, 0, 256) {
for (c, 0, 1280) {
for (kk, 0, 4) {
U[((((((((eps*4) + nu)*256) + k)*1280) + c)*4) + kk)] = 0.000000f
for (r_kh, 0, 3) {
for (r_kw, 0, 3) {
U[((((((((eps*4) + nu)*256) + k)*1280) + c)*4) + kk)] = (U[((((((((eps*4) + nu)*256) + k)*1280) + c)*4) + kk)] + ((weight[(((((((k*5120) + c) + (kk*1280))*3) + r_kh)*3) + r_kw)]*G[((eps*3) + r_kh)])*G[((nu*3) + r_kw)]))
}
}
}
}
}
}
}
}
Could you help to check why my lower command is not right?
But the generated cuda code already removed zero in G and s[G].compute_inline is necessary.
Thanks
I modify my lower code like following
print "transform kernel lower"
su = tvm.create_schedule(s[U].op)
su[G].compute_inline()
eps1, nu1, k1, c1, kk1 = su[U].op.axis
r_kh1, r_kw1 = su[U].op.reduce_axis
su[U].reorder(k1,c1, kk1, eps1, nu1, r_kh1, r_kw1)
_ = [su[U].unroll(x) for x in [eps1, nu1, r_kh1, r_kw1]]
print(tvm.lower(su, [kernel, G], simple_mode=True))
I could get the right IR.
Thanks
hi, @merrymercy
in conv2d.py, winograd algorithm do G and B like normal matrix multiplication, this will not reduce multiplication by plus/minus. Is this understood correct?
Thanks