sophgo / tpu-mlir

Machine learning compiler based on MLIR for Sophgo TPU.
Other
577 stars 147 forks source link

Aborted (core dumped) #160

Open jwang-ema opened 8 months ago

jwang-ema commented 8 months ago

请问在MLIR转F32模型时报以下的错,是否有具体的错误提示或者解决方法呢?

[Running]: tpuc-opt C-3PO_vgg16bn_mtf_msf_deeplabv3_bm1684_f32_final.mlir --codegen="model_file=C-3PO_vgg16bn_mtf_msf_deeplabv3_1684_f32.bmodel embed_debug_info=false model_version=latest" -o /dev/null bmcpu init: skip cpu_user_defined Cannot open libusercpu.so, disable user cpu layer. in cmodel, enable profile. BM1684 DO NOT Support Such Attribute.!!!! UNREACHABLE executed at /workspace/lib/Dialect/Tpu/Interfaces/BM1684/Interp.cpp:56!

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.

Stack dump:

  1. Program arguments: tpuc-opt C-3PO_vgg16bn_mtf_msf_deeplabv3_bm1684_f32_final.mlir --init "--codegen=model_file=C-3PO_vgg16bn_mtf_msf_deeplabv3_1684_f32.bmodel embed_debug_info=false model_version=latest" --deinit --mlir-print-debuginfo -o /dev/null

    0 0x000055c7ef698987 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/workspace/install/bin/tpuc-opt+0x603987)

    1 0x000055c7ef6966ae llvm::sys::RunSignalHandlers() (/workspace/install/bin/tpuc-opt+0x6016ae)

    2 0x000055c7ef69930a SignalHandler(int) Signals.cpp:0:0

    3 0x00007faa2259c520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)

    4 0x00007faa225f09fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)

    5 0x00007faa2259c476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)

    6 0x00007faa225827f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)

    7 0x000055c7ef6964d1 (/workspace/install/bin/tpuc-opt+0x6014d1)

    8 0x000055c7efca92a9 (/workspace/install/bin/tpuc-opt+0xc142a9)

    9 0x000055c7efc26acc tpu_mlir::detail::GlobalGenInterfaceInterfaceTraits::Model::codegen_global_bm168x(tpu_mlir::detail::GlobalGenInterfaceInterfaceTraits::Concept const, mlir::Operation) (/workspace/install/bin/tpuc-opt+0xb91acc)

    10 0x000055c7efd964f0 mlir::GlobalGenInterfaceDecorator::codegen_global_bm168x() (/workspace/install/bin/tpuc-opt+0xd014f0)

    11 0x000055c7efd9037c tpu_mlir::tpu::BMCodegen::codegen(mlir::Operation*) (/workspace/install/bin/tpuc-opt+0xcfb37c)

    12 0x000055c7ef7e976e void mlir::detail::walk(mlir::Operation, llvm::function_ref<void (mlir::Operation)>, mlir::WalkOrder) (/workspace/install/bin/tpuc-opt+0x75476e)

    13 0x000055c7efd87a87 tpu_mlir::tpu::BMCodegen::CreateSubNet(mlir::ModuleOp, mlir::func::CallOp) (/workspace/install/bin/tpuc-opt+0xcf2a87)

    14 0x000055c7efd87933 mlir::WalkResult llvm::function_ref<mlir::WalkResult (mlir::Operation)>::callback_fn<std::enable_if<!llvm::is_one_of<mlir::func::FuncOp, mlir::Operation, mlir::Region, mlir::Block>::value && std::is_same<mlir::WalkResult, mlir::WalkResult>::value, mlir::WalkResult>::type mlir::detail::walk<(mlir::WalkOrder)0, mlir::ForwardIterator, tpu_mlir::tpu::BMCodegen::run(mlir::ModuleOp, bool)::$_1, mlir::func::FuncOp, mlir::WalkResult>(mlir::Operation, tpu_mlir::tpu::BMCodegen::run(mlir::ModuleOp, bool)::$_1&&)::'lambda'(mlir::Operation)>(long, mlir::Operation*) BM168xCodegen.cpp:0:0

    15 0x000055c7ef7ea8f8 mlir::WalkResult mlir::detail::walk(mlir::Operation, llvm::function_ref<mlir::WalkResult (mlir::Operation)>, mlir::WalkOrder) (/workspace/install/bin/tpuc-opt+0x7558f8)

    16 0x000055c7ef7ea8a7 mlir::WalkResult mlir::detail::walk(mlir::Operation, llvm::function_ref<mlir::WalkResult (mlir::Operation)>, mlir::WalkOrder) (/workspace/install/bin/tpuc-opt+0x7558a7)

    17 0x000055c7efd85996 tpu_mlir::tpu::BMCodegen::run(mlir::ModuleOp, bool) (/workspace/install/bin/tpuc-opt+0xcf0996)

    18 0x000055c7efd83365 tpu_mlir::tpu::CodegenPass::runOnOperation() (/workspace/install/bin/tpuc-opt+0xcee365)

    19 0x000055c7f0464c34 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass, mlir::Operation, mlir::AnalysisManager, bool, unsigned int) (/workspace/install/bin/tpuc-opt+0x13cfc34)

    20 0x000055c7f0465261 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor, mlir::PassInstrumentation::PipelineParentInfo const*) (/workspace/install/bin/tpuc-opt+0x13d0261)

    21 0x000055c7f0467708 mlir::PassManager::run(mlir::Operation*) (/workspace/install/bin/tpuc-opt+0x13d2708)

    22 0x000055c7ef68a1cb performActions(llvm::raw_ostream&, std::shared_ptr const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) MlirOptMain.cpp:0:0

    23 0x000055c7ef689594 mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_2>(long, std::unique_ptr<llvm::MemoryBuffer, std::default_delete>, llvm::raw_ostream&) MlirOptMain.cpp:0:0

    24 0x000055c7f066cf28 mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer, std::default_delete>, llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete>, llvm::raw_ostream&)>, llvm::raw_ostream&, bool, bool) (/workspace/install/bin/tpuc-opt+0x15d7f28)

    25 0x000055c7ef68389a mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) (/workspace/install/bin/tpuc-opt+0x5ee89a)

    26 0x000055c7ef683d64 mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) (/workspace/install/bin/tpuc-opt+0x5eed64)

    27 0x000055c7ef682ae9 main (/workspace/install/bin/tpuc-opt+0x5edae9)

    28 0x00007faa22583d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)

    29 0x00007faa22583e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)

    30 0x000055c7ef682745 _start (/workspace/install/bin/tpuc-opt+0x5ed745)

    Aborted (core dumped) Traceback (most recent call last): File "/workspace/python/tools/model_deploy.py", line 331, in tool.build_model() File "/workspace/python/tools/model_deploy.py", line 222, in build_model mlir_to_model(self.tpu_mlir, self.model, self.final_mlir, self.dynamic, File "/workspace/python/utils/mlir_shell.py", line 169, in mlir_to_model _os_system(cmd) File "/workspace/python/utils/mlir_shell.py", line 50, in _os_system raise RuntimeError("[!Error]: {}".format(cmd_str)) RuntimeError: [!Error]: tpuc-opt C-3PO_vgg16bn_mtf_msf_deeplabv3_bm1684_f32_final.mlir --codegen="model_file=C-3PO_vgg16bn_mtf_msf_deeplabv3_1684_f32.bmodel embed_debug_info=false model_version=latest" -o /dev/null

lordrebel commented 8 months ago

i guess the mode of interOp in your C-3PO_vgg16bn_mtf_msf_deeplabv3_bm1684_f32_final.mlir is setted to nearest and the coorMode is neither half_pixel nor pytorch_half_pixel,based on the code at : lib/Dialect/Tpu/Interfaces/BM1684/Interp.cpp:56

yanzongs commented 5 months ago

i guess the mode of interOp in your is setted to and the coorMode is neither nor ,based on the code at : lib/Dialect/Tpu/Interfaces/BM1684/Interp.cpp:56C-3PO_vgg16bn_mtf_msf_deeplabv3_bm1684_f32_final.mlir``nearest``half_pixel``pytorch_half_pixel

I meet the same question,so what should I do to slove it?

lordrebel commented 5 months ago

i guess the mode of interOp in your is setted to and the coorMode is neither nor ,based on the code at : lib/Dialect/Tpu/Interfaces/BM1684/Interp.cpp:56C-3PO_vgg16bn_mtf_msf_deeplabv3_bm1684_f32_final.mlirnearesthalf_pixelpytorch_half_pixel ``

I meet the same question,so what should I do to slove it?

sorry,bro Im not the contributor,but base on the code, it looks like interOp not support coormode == half_pixel or pytorch_half_pixel when the mode set to "nearst"

    }else if(getMode() == tpu::ResizeMode::nearest){
        if (getCoordMode() == tpu::ResizeCoordMode::half_pixel){
            platform_sp = ONNX_NEAREST;
            half_pixel_centers = 1;
            align_corners = 0;
        } else if (getCoordMode() == tpu::ResizeCoordMode::pytorch_half_pixel){
            platform_sp = PYTORCH_NEAREST;
            half_pixel_centers = 1;
            align_corners = 0;
        } else{
            llvm_unreachable("BM1684 DO NOT Support Such Attribute.!!!!");
        }
    }

maybe you can change your model or checkout the interop

jwang-ema commented 1 month ago

i guess the mode of interOp in your is setted to and the coorMode is neither nor ,based on the code at : lib/Dialect/Tpu/Interfaces/BM1684/Interp.cpp:56C-3PO_vgg16bn_mtf_msf_deeplabv3_bm1684_f32_final.mlirnearesthalf_pixelpytorch_half_pixel ``

I meet the same question,so what should I do to slove it?

好久没看消息了,我的这个问题解决了。不过我有点记不大清了,后续换了backbone了,大概是因为模型里面有个空洞卷积,扩张率设置的太大导致的出错。你可以沿着我的这个思路去排查一下,如果对你还有帮助的话。 @yanzongs