microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
MIT License
948 stars 158 forks source link

[BUG] Antares IR problem in a model with Transformer architecture #212

Open xysmlx opened 3 years ago

xysmlx commented 3 years ago

🐛 Bug

GatherV2

There is at least one wrong Antares IR in the following candidates.

[INFO] 2021-02-01T03:02:52z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[] = input0[input1[]]; ", input_dict={ "input0" : { "dtype" : "int64", "shape" : [3]} ,  "input1" : { "dtype" : "int64", "shape" : [1]} })
[INFO] 2021-02-01T03:02:52z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [30522, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [3200, 20]} })
[INFO] 2021-02-01T03:02:53z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [30522, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [320, 20]} })
[INFO] 2021-02-01T03:02:53z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [30522, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [320, 10]} })
[INFO] 2021-02-01T03:02:53z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [30522, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [3200, 5]} })
[INFO] 2021-02-01T03:02:53z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [30522, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [320, 32]} })
[INFO] 2021-02-01T03:02:53z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [30522, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [1280, 20]} })
[INFO] 2021-02-01T03:02:53z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [30522, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [64, 10]} })
[INFO] 2021-02-01T03:02:53z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [30522, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [64, 32]} })
[INFO] 2021-02-01T03:02:53z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [30522, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [64, 20]} })
[INFO] 2021-02-01T03:02:53z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [30522, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [1280, 10]} })
[INFO] 2021-02-01T03:02:53z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[] = input0[input1[]]; ", input_dict={ "input0" : { "dtype" : "int64", "shape" : [2]} ,  "input1" : { "dtype" : "int64", "shape" : [1]} })
[INFO] 2021-02-01T03:02:54z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [2, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [3200, 20]} })
[INFO] 2021-02-01T03:02:54z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [2, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [320, 20]} })
[INFO] 2021-02-01T03:02:54z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [2, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [320, 10]} })
[INFO] 2021-02-01T03:02:54z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [2, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [3200, 5]} })
[INFO] 2021-02-01T03:02:54z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [2, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [320, 32]} })
[INFO] 2021-02-01T03:02:54z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [2, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [1280, 20]} })
[INFO] 2021-02-01T03:02:54z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [2, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [64, 10]} })
[INFO] 2021-02-01T03:02:54z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [2, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [64, 32]} })
[INFO] 2021-02-01T03:02:54z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [2, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [64, 20]} })
[INFO] 2021-02-01T03:02:54z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [2, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [1280, 10]} })
[INFO] 2021-02-01T03:02:55z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [32, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [3200, 20]} })
[INFO] 2021-02-01T03:02:55z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [32, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [320, 20]} })
[INFO] 2021-02-01T03:02:55z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [32, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [320, 10]} })
[INFO] 2021-02-01T03:02:55z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [32, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [3200, 5]} })
[INFO] 2021-02-01T03:02:55z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [32, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [320, 32]} })
[INFO] 2021-02-01T03:02:55z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [32, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [1280, 20]} })
[INFO] 2021-02-01T03:02:56z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [32, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [64, 10]} })
[INFO] 2021-02-01T03:02:56z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [32, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [64, 32]} })
[INFO] 2021-02-01T03:02:56z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [32, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [64, 20]} })
[INFO] 2021-02-01T03:02:56z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        GatherV2, ir:  - einstein_v2(" output0[N0, N1, N2] = input0[input1[N0, N1], N2]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [32, 128]} ,  "input1" : { "dtype" : "int64", "shape" : [1280, 10]} })

Possible problem: Antares has problems in evaluating generated kernels with index accessing (e.g., Gather) because the generated random data may cause invalid memory access.

Dot

[ERROR] 2021-02-01T03:02:57z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 58         - einstein_v2(" output0[N, M] +=! input0[N, K] * input1[K, M]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [3200, 20, 128]} ,  "input1" : { "dtype" : "float32", "shape" : [128, 128]} })
[ERROR] Traceback (most recent call last):
  File "./antares/antares_compiler.py", line 622, in get
    code = main_compute(code_only=True)
  File "./antares/antares_compiler.py", line 377, in main_compute
    task = autotvm.task.create("template_op", args=(), target=tvm_target)
  File "/opt/tvm/python/tvm/autotvm/task/task.py", line 457, in create
    sch, _ = ret.func(*args)
  File "/opt/tvm/python/tvm/autotvm/task/task.py", line 236, in __call__
    return self.fcustomized(*args, **kwargs)
  File "/antares/lang/generic.py", line 187, in get_template_op
    exec('import tvm; from tvm import topi; ' + program, globals())
  File "<string>", line 1, in <module>
  File "/antares/lang/generic.py", line 29, in einstein_v2
    exec(ir, globals())
  File "<string>", line 2, in <module>
  File "/antares/lang/generic.py", line 96, in output
    result = te.compute(shape, func, name=name, tag=tag)
  File "/opt/tvm/python/tvm/te/operation.py", line 105, in compute
    body = fcompute(*[v.var for v in dim_var])
  File "<string>", line 2, in <lambda>
  File "/opt/tvm/python/tvm/tir/expr.py", line 75, in __mul__
    return _generic.multiply(self, other)
  File "/opt/tvm/python/tvm/topi/generic_op_impl.py", line 83, in _tensor_bop_impl
    return orig_bop(lhs, rhs)
  File "/opt/tvm/python/tvm/tir/generic.py", line 80, in multiply
    return _ffi_api._OpMul(lhs, rhs)
  File "/opt/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 223, in __call__
    values, tcodes, num_args = _make_tvm_args(args, temp_args)
  File "/opt/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 166, in _make_tvm_args
    arg = _FUNC_CONVERT_TO_OBJECT(arg)
  File "/opt/tvm/python/tvm/runtime/object_generic.py", line 74, in convert_to_object
    return value.asobject()
  File "/opt/tvm/python/tvm/te/tensor.py", line 43, in asobject
    return self.tensor(*self.indices)
  File "/opt/tvm/python/tvm/te/tensor.py", line 63, in __call__
    raise ValueError("Need to provide %d index in tensor slice" % ndim)
ValueError: Need to provide 3 index in tensor slice

[INFO] 2021-02-01T03:02:57z src/nnfusion/engine/pass/graph/kernel_tuning.cpp 249        Dot, ir:  - einstein_v2(" output0[N, M] +=! input0[N, K] * input1[K, M]; ", input_dict={ "input0" : { "dtype" : "float32", "shape" : [3200, 20, 128]} ,  "input1" : { "dtype" : "float32", "shape" : [128, 128]} })
nnfbot commented 3 years ago

Thanks for the report @xysmlx! I will look into it ASAP! (I'm a bot).