TPU-MLIR 转换成功的 bmodel 会输出值为 < nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan ...

我使用master支对stable duffusion 中的 unet 进行模型编译，编译过程如下

model_transform.py  \
        --model_name unet  \
        --input_shape [[1,4,96,64],[1],[1,77,768],[1,1280,12,8],[1,320,96,64],[1,320,96,64],[1,320,96,64],[1,320,48,32],[1,640,48,32],[1,640,48,32],[1,640,24,16],[1,1280,24,16],[1,1280,24,16],[1,1280,12,8],[1,1280,12,8],[1,1280,12,8]] \
        --model_def ./unet.pt \
    --mlir unet3.mlir

# quantize 可选BF16、F16和F32
model_deploy.py\
    --mlir unet3.mlir \
    --quantize BF16 \
    --chip bm1684x \
    --model unet_1684x_BF16_1_4_96_64.bmodel

模型可以被编译成功，但是在 1684X 上运行 bmrt_test 时发现 output 值不正确，结果影响了后续 pipeline

以下为 bmet_test 打印结果

(.venv) linaro@bm1684:/data/ssd/docs_check/SD-lcm-tpu/models/basic/realcartoonRealistic$ bmrt_test --bmodel unet_1684x_BF16_1_4_96_64.bmodel 
[BMRT][deal_with_options:1446] INFO:Loop num: 1
[BMRT][bmrt_test:723] WARNING:setpriority failed, cpu time might flutuate.
[BMRT][bmcpu_setup:406] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init 
[BMRT][load_bmodel:1084] INFO:Loading bmodel from [unet_1684x_BF16_1_4_96_64.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1023] INFO:pre net num: 0, load net num: 1
[BMRT][show_net_info:1520] INFO: ########################
[BMRT][show_net_info:1521] INFO: NetName: unet, Index=0
[BMRT][show_net_info:1523] INFO: ---- stage 0 ----
[BMRT][show_net_info:1532] INFO:   Input 0) 'sample.1' shape=[ 1 4 96 64 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1532] INFO:   Input 1) 'timestep.1' shape=[ 1 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1532] INFO:   Input 2) 'encoder_hidden_states.1' shape=[ 1 77 768 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1532] INFO:   Input 3) 'mid_block_additional_residual.1' shape=[ 1 1280 12 8 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1532] INFO:   Input 4) 'down_block_additional_residuals_0.1' shape=[ 1 320 96 64 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1532] INFO:   Input 5) 'down_block_additional_residuals_1.1' shape=[ 1 320 96 64 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1532] INFO:   Input 6) 'down_block_additional_residuals_2.1' shape=[ 1 320 96 64 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1532] INFO:   Input 7) 'down_block_additional_residuals_3.1' shape=[ 1 320 48 32 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1532] INFO:   Input 8) 'down_block_additional_residuals_4.1' shape=[ 1 640 48 32 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1532] INFO:   Input 9) 'down_block_additional_residuals_5.1' shape=[ 1 640 48 32 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1532] INFO:   Input 10) 'down_block_additional_residuals_6.1' shape=[ 1 640 24 16 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1532] INFO:   Input 11) 'down_block_additional_residuals_7.1' shape=[ 1 1280 24 16 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1532] INFO:   Input 12) 'down_block_additional_residuals_8.1' shape=[ 1 1280 24 16 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1532] INFO:   Input 13) 'down_block_additional_residuals_9.1' shape=[ 1 1280 12 8 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1532] INFO:   Input 14) 'down_block_additional_residuals_10.1' shape=[ 1 1280 12 8 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1532] INFO:   Input 15) 'down_block_additional_residuals_11.1' shape=[ 1 1280 12 8 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1542] INFO:   Output 0) '5702_f32' shape=[ 1 4 96 64 ] dtype=FLOAT32 scale=1 zero_point=0
[BMRT][show_net_info:1545] INFO: ########################
[BMRT][bmrt_test:782] INFO:==> running network #0, name: unet, loop: 0
[BMRT][bmrt_test:868] INFO:reading input #0, bytesize=98304
[BMRT][print_array:706] INFO:  --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=24576
[BMRT][bmrt_test:868] INFO:reading input #1, bytesize=4
[BMRT][print_array:706] INFO:  --> input_data: < 0 >
[BMRT][bmrt_test:868] INFO:reading input #2, bytesize=236544
[BMRT][print_array:706] INFO:  --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=59136
[BMRT][bmrt_test:868] INFO:reading input #3, bytesize=491520
[BMRT][print_array:706] INFO:  --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=122880
[BMRT][bmrt_test:868] INFO:reading input #4, bytesize=7864320
[BMRT][print_array:706] INFO:  --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=1966080
[BMRT][bmrt_test:868] INFO:reading input #5, bytesize=7864320
[BMRT][print_array:706] INFO:  --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=1966080
[BMRT][bmrt_test:868] INFO:reading input #6, bytesize=7864320
[BMRT][print_array:706] INFO:  --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=1966080
[BMRT][bmrt_test:868] INFO:reading input #7, bytesize=1966080
[BMRT][print_array:706] INFO:  --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=491520
[BMRT][bmrt_test:868] INFO:reading input #8, bytesize=3932160
[BMRT][print_array:706] INFO:  --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=983040
[BMRT][bmrt_test:868] INFO:reading input #9, bytesize=3932160
[BMRT][print_array:706] INFO:  --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=983040
[BMRT][bmrt_test:868] INFO:reading input #10, bytesize=983040
[BMRT][print_array:706] INFO:  --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=245760
[BMRT][bmrt_test:868] INFO:reading input #11, bytesize=1966080
[BMRT][print_array:706] INFO:  --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=491520
[BMRT][bmrt_test:868] INFO:reading input #12, bytesize=1966080
[BMRT][print_array:706] INFO:  --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=491520
[BMRT][bmrt_test:868] INFO:reading input #13, bytesize=491520
[BMRT][print_array:706] INFO:  --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=122880
[BMRT][bmrt_test:868] INFO:reading input #14, bytesize=491520
[BMRT][print_array:706] INFO:  --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=122880
[BMRT][bmrt_test:868] INFO:reading input #15, bytesize=491520
[BMRT][print_array:706] INFO:  --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=122880
[BMRT][bmrt_test:1005] INFO:reading output #0, bytesize=98304
[BMRT][print_array:706] INFO:  --> output ref_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=24576
[BMRT][bmrt_test:1039] INFO:net[unet] stage[0], launch total time is 228892 us (npu 228767 us, cpu 125 us)
[BMRT][bmrt_test:1042] INFO:+++ The network[unet] stage[0] output_data +++
[BMRT][print_array:706] INFO:output data #0 shape: [1 4 96 64 ] < nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan ... > len=24576
[BMRT][bmrt_test:1083] INFO:load input time(s): 0.039901
[BMRT][bmrt_test:1084] INFO:calculate  time(s): 0.228904
[BMRT][bmrt_test:1085] INFO:get output time(s): 0.000147
[BMRT][bmrt_test:1086] INFO:compare    time(s): 0.000144

output_shape 结果值为 nan nan nan nan ....

sophgo / tpu-mlir

TPU-MLIR 转换成功的 bmodel 会输出值为 < nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan ... #165