Closed zifeng-radxa closed 5 months ago
我使用master支对stable duffusion 中的 unet 进行模型编译,编译过程如下
model_transform.py \ --model_name unet \ --input_shape [[1,4,96,64],[1],[1,77,768],[1,1280,12,8],[1,320,96,64],[1,320,96,64],[1,320,96,64],[1,320,48,32],[1,640,48,32],[1,640,48,32],[1,640,24,16],[1,1280,24,16],[1,1280,24,16],[1,1280,12,8],[1,1280,12,8],[1,1280,12,8]] \ --model_def ./unet.pt \ --mlir unet3.mlir # quantize 可选BF16、F16和F32 model_deploy.py\ --mlir unet3.mlir \ --quantize BF16 \ --chip bm1684x \ --model unet_1684x_BF16_1_4_96_64.bmodel
模型可以被编译成功,但是在 1684X 上运行 bmrt_test 时发现 output 值不正确,结果影响了后续 pipeline
以下为 bmet_test 打印结果
(.venv) linaro@bm1684:/data/ssd/docs_check/SD-lcm-tpu/models/basic/realcartoonRealistic$ bmrt_test --bmodel unet_1684x_BF16_1_4_96_64.bmodel [BMRT][deal_with_options:1446] INFO:Loop num: 1 [BMRT][bmrt_test:723] WARNING:setpriority failed, cpu time might flutuate. [BMRT][bmcpu_setup:406] INFO:cpu_lib 'libcpuop.so' is loaded. bmcpu init: skip cpu_user_defined open usercpu.so, init user_cpu_init [BMRT][load_bmodel:1084] INFO:Loading bmodel from [unet_1684x_BF16_1_4_96_64.bmodel]. Thanks for your patience... [BMRT][load_bmodel:1023] INFO:pre net num: 0, load net num: 1 [BMRT][show_net_info:1520] INFO: ######################## [BMRT][show_net_info:1521] INFO: NetName: unet, Index=0 [BMRT][show_net_info:1523] INFO: ---- stage 0 ---- [BMRT][show_net_info:1532] INFO: Input 0) 'sample.1' shape=[ 1 4 96 64 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1532] INFO: Input 1) 'timestep.1' shape=[ 1 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1532] INFO: Input 2) 'encoder_hidden_states.1' shape=[ 1 77 768 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1532] INFO: Input 3) 'mid_block_additional_residual.1' shape=[ 1 1280 12 8 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1532] INFO: Input 4) 'down_block_additional_residuals_0.1' shape=[ 1 320 96 64 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1532] INFO: Input 5) 'down_block_additional_residuals_1.1' shape=[ 1 320 96 64 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1532] INFO: Input 6) 'down_block_additional_residuals_2.1' shape=[ 1 320 96 64 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1532] INFO: Input 7) 'down_block_additional_residuals_3.1' shape=[ 1 320 48 32 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1532] INFO: Input 8) 'down_block_additional_residuals_4.1' shape=[ 1 640 48 32 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1532] INFO: Input 9) 'down_block_additional_residuals_5.1' shape=[ 1 640 48 32 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1532] INFO: Input 10) 'down_block_additional_residuals_6.1' shape=[ 1 640 24 16 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1532] INFO: Input 11) 'down_block_additional_residuals_7.1' shape=[ 1 1280 24 16 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1532] INFO: Input 12) 'down_block_additional_residuals_8.1' shape=[ 1 1280 24 16 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1532] INFO: Input 13) 'down_block_additional_residuals_9.1' shape=[ 1 1280 12 8 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1532] INFO: Input 14) 'down_block_additional_residuals_10.1' shape=[ 1 1280 12 8 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1532] INFO: Input 15) 'down_block_additional_residuals_11.1' shape=[ 1 1280 12 8 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1542] INFO: Output 0) '5702_f32' shape=[ 1 4 96 64 ] dtype=FLOAT32 scale=1 zero_point=0 [BMRT][show_net_info:1545] INFO: ######################## [BMRT][bmrt_test:782] INFO:==> running network #0, name: unet, loop: 0 [BMRT][bmrt_test:868] INFO:reading input #0, bytesize=98304 [BMRT][print_array:706] INFO: --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=24576 [BMRT][bmrt_test:868] INFO:reading input #1, bytesize=4 [BMRT][print_array:706] INFO: --> input_data: < 0 > [BMRT][bmrt_test:868] INFO:reading input #2, bytesize=236544 [BMRT][print_array:706] INFO: --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=59136 [BMRT][bmrt_test:868] INFO:reading input #3, bytesize=491520 [BMRT][print_array:706] INFO: --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=122880 [BMRT][bmrt_test:868] INFO:reading input #4, bytesize=7864320 [BMRT][print_array:706] INFO: --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=1966080 [BMRT][bmrt_test:868] INFO:reading input #5, bytesize=7864320 [BMRT][print_array:706] INFO: --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=1966080 [BMRT][bmrt_test:868] INFO:reading input #6, bytesize=7864320 [BMRT][print_array:706] INFO: --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=1966080 [BMRT][bmrt_test:868] INFO:reading input #7, bytesize=1966080 [BMRT][print_array:706] INFO: --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=491520 [BMRT][bmrt_test:868] INFO:reading input #8, bytesize=3932160 [BMRT][print_array:706] INFO: --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=983040 [BMRT][bmrt_test:868] INFO:reading input #9, bytesize=3932160 [BMRT][print_array:706] INFO: --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=983040 [BMRT][bmrt_test:868] INFO:reading input #10, bytesize=983040 [BMRT][print_array:706] INFO: --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=245760 [BMRT][bmrt_test:868] INFO:reading input #11, bytesize=1966080 [BMRT][print_array:706] INFO: --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=491520 [BMRT][bmrt_test:868] INFO:reading input #12, bytesize=1966080 [BMRT][print_array:706] INFO: --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=491520 [BMRT][bmrt_test:868] INFO:reading input #13, bytesize=491520 [BMRT][print_array:706] INFO: --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=122880 [BMRT][bmrt_test:868] INFO:reading input #14, bytesize=491520 [BMRT][print_array:706] INFO: --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=122880 [BMRT][bmrt_test:868] INFO:reading input #15, bytesize=491520 [BMRT][print_array:706] INFO: --> input_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=122880 [BMRT][bmrt_test:1005] INFO:reading output #0, bytesize=98304 [BMRT][print_array:706] INFO: --> output ref_data: < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > len=24576 [BMRT][bmrt_test:1039] INFO:net[unet] stage[0], launch total time is 228892 us (npu 228767 us, cpu 125 us) [BMRT][bmrt_test:1042] INFO:+++ The network[unet] stage[0] output_data +++ [BMRT][print_array:706] INFO:output data #0 shape: [1 4 96 64 ] < nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan ... > len=24576 [BMRT][bmrt_test:1083] INFO:load input time(s): 0.039901 [BMRT][bmrt_test:1084] INFO:calculate time(s): 0.228904 [BMRT][bmrt_test:1085] INFO:get output time(s): 0.000147 [BMRT][bmrt_test:1086] INFO:compare time(s): 0.000144
output_shape 结果值为 nan nan nan nan ....
the new version of tpu-mlir have solved this problem
我使用master支对stable duffusion 中的 unet 进行模型编译,编译过程如下
模型可以被编译成功,但是在 1684X 上运行 bmrt_test 时发现 output 值不正确,结果影响了后续 pipeline
以下为 bmet_test 打印结果
output_shape 结果值为 nan nan nan nan ....