microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
MIT License
956 stars 159 forks source link

[BUG] Constant folding check failure in NNFUSION_CHECK(inputs[i].size() == _size) #217

Open xysmlx opened 3 years ago

xysmlx commented 3 years ago

🐛 Bug

Enabling const folding by setting -fconst_folding_backend=CUDA for a gnn model leads to check failure in const folding pass

[INFO] 2021-02-03T02:23:51z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 242   Runtime Constant Folding Pass starts up for Graph: Graph_1
[INFO] 2021-02-03T02:23:51z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 58    >> Found constant downstream node: 394, Op Type = GatherV2
[INFO] 2021-02-03T02:23:51z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 71      Input of constant downstream node: 392, Op Type = Constant/Constant
[INFO] 2021-02-03T02:23:51z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 83      With Constant Input Node: 392, Memory Length = 24
[INFO] 2021-02-03T02:23:51z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 71      Input of constant downstream node: 393, Op Type = Constant/Constant
[INFO] 2021-02-03T02:23:51z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 83      With Constant Input Node: 393, Memory Length = 8
constant_folding debug: 24 24 element::Type{64, 0, 1, 0, "int64_t"}
constant_folding debug: 8 8 element::Type{64, 0, 1, 0, "int64_t"}
[INFO] 2021-02-03T02:23:53z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 154     For node `394`: get runtime output results of size 1
[INFO] 2021-02-03T02:23:53z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 210     Finish folding 1th node: name = graph_node_580/graph_node_580, type =
[INFO] 2021-02-03T02:23:53z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 213
[INFO] 2021-02-03T02:23:53z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 58    >> Found constant downstream node: 391, Op Type = GatherV2
[INFO] 2021-02-03T02:23:53z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 71      Input of constant downstream node: 390, Op Type = Constant/Constant
[INFO] 2021-02-03T02:23:53z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 83      With Constant Input Node: 390, Memory Length = 8
[INFO] 2021-02-03T02:23:53z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 71      Input of constant downstream node: 389, Op Type = Constant/Constant
[INFO] 2021-02-03T02:23:53z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 83      With Constant Input Node: 389, Memory Length = 24
constant_folding debug: 8 24 element::Type{64, 0, 1, 0, "int64_t"}
[ERROR] 2021-02-03T02:23:53z src/nnfusion/util/errors.hpp 169   Check failed: 'inputs[i].size() == _size' at /home/lingm/projects0/nnfusion_mlx/src/nnfusion/engine/profiler/profiler.hpp:100:
(no explanation given)
terminate called after throwing an instance of 'nnfusion::errors::CheckError'
  what():  Check failed: 'inputs[i].size() == _size' at /home/lingm/projects0/nnfusion_mlx/src/nnfusion/engine/profiler/profiler.hpp:100:
(no explanation given)
Aborted (core dumped)

Here is the check failure code:

// multiple inputs (or outputs) may have different element types
bool mixed_type_execute(const vector<vector<char>>& inputs,
                        vector<vector<char>>& outputs)
{
    auto& kernel_mem = pctx->kernel_memory;
    auto kctx = pctx->kernel->m_context;
    NNFUSION_CHECK(inputs.size() == kctx->inputs.size());

    for (size_t i = 0; i < kctx->inputs.size(); i++)
    {
        auto& t = kctx->inputs[i];
        size_t _size = t->size();

        std::cout << "constant_folding debug: " << inputs[i].size() << " " << _size
                    << " " << t->get_shape() << " " << t->get_element_type() << std::endl;
        NNFUSION_CHECK(inputs[i].size() == _size); // check failure here

        kernel_mem->load_input_from(i, inputs[i].data(), _size);
    }

    if (rt->execute(pctx, kernel_mem->unsafe_inputs(), kernel_mem->unsafe_outputs()) <
        0)
    {
        NNFUSION_LOG(ERROR) << "Failed execute the kernel.";
        return false;
    }

    outputs.clear();
    void** ptrs = kernel_mem->unsafe_outputs();
    for (size_t i = 0; i < kctx->outputs.size(); ++i)
    {
        auto& t = kctx->outputs[i];
        size_t _size = t->size();

        NNFUSION_CHECK(ptrs[i] != nullptr);
        vector<char> output(_size);
        memcpy(output.data(), ptrs[i], _size);

        outputs.push_back(move(output));
    }
    return true;
}

To Reproduce Steps to reproduce the behavior:

  1. nnfusion gnn.onnx -f onnx -fconst_folding_backend=CUDA
nnfbot commented 3 years ago

Thanks for the report @xysmlx! I will look into it ASAP! (I'm a bot).

xysmlx commented 3 years ago

Meet the same problem in bert training:

[INFO] 2021-03-16T04:29:51z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 58    >> Found constant downstream node: 210, Op Type = GatherV2
[INFO] 2021-03-16T04:29:51z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 71      Input of constant downstream node: 209, Op Type = Constant/Constant
[INFO] 2021-03-16T04:29:51z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 83      With Constant Input Node: 209, Memory Length = 8
[INFO] 2021-03-16T04:29:51z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 71      Input of constant downstream node: 208, Op Type = Constant/Constant
[INFO] 2021-03-16T04:29:51z src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp 83      With Constant Input Node: 208, Memory Length = 16
[ERROR] 2021-03-16T04:29:51z src/nnfusion/util/errors.hpp 169   Check failed: 'inputs[i].size() == _size' at /home/lingm/projects0/nnfusion_mlx/src/nnfusion/engine/profiler/profiler.hpp:97:
(no explanation given)
terminate called after throwing an instance of 'nnfusion::errors::CheckError'
  what():  Check failed: 'inputs[i].size() == _size' at /home/lingm/projects0/nnfusion_mlx/src/nnfusion/engine/profiler/profiler.hpp:97:
(no explanation given)
Aborted (core dumped)

To reproduce:

nnfusion bert_train_bs2.onnx -f onnx -fautodiff=true -ftraining_mode=true -ftraining_optimizer='{"optimizer": "SGD", "learning_rate": 0.0001}' -fblockfusion_level=0 -fkernel_fusion_level=0 -fconst_folding_backend=CUDA

bert_train_bs2.onnx is generated from src/python/example/bert.py