[Qualcomm] How to know the compilation options used for RetinaNet?

arjunsuresh commented 10 months ago

When we try to reproduce the instructions given here we get the below error

krai@9eef7784f211:~$ export SUT=q2_pro_dc
krai@9eef7784f211:~$ axs byquery sut_name=${SUT},kilt_ready,device=qaic,model_name=retinanet,index_file=openimages_cal_images_list.txt,loadgen_scenario=Offline
WARNING:root:[base_qaic_config] parameters file /home/krai/work_collection/axs2qaic_mlperf_3.1/base_qaic_config/data_axs.json did not exist, initializing to empty parameters
WARNING:root:[work_collection] byquery(sut_name=q2_pro_dc,kilt_ready,device=qaic,model_name=retinanet,index_file=openimages_cal_images_list.txt,loadgen_scenario=Offline) did not find anything, but there are tags: {'kilt_ready'} , trying to find a producer...
WARNING:root:[work_collection] A total of 1 matched rules found.

WARNING:root:Matched Rule #1/1: ['kilt_ready', 'device=qaic', 'model_name=retinanet'] from Entry 'model_qaic_retinanet_recipe'...
WARNING:root:Pipeline: [['run']], Cumulative params: {'__query': 'sut_name=q2_pro_dc,kilt_ready,device=qaic,model_name=retinanet,index_file=openimages_cal_images_list.txt,loadgen_scenario=Offline', 'return_saved_record_entry': True, 'device': 'qaic', 'model_name': 'retinanet', 'sut_name': 'q2_pro_dc', 'index_file': 'openimages_cal_images_list.txt', 'loadgen_scenario': 'Offline', 'tags': ['kilt_ready']}
WARNING:root:[base_qaic_model] touch _BEFORE_CODE_LOADING=/home/krai/work_collection/axs2qaic_mlperf_3.1/qaic_tool_parser
WARNING:root:[work_collection] byquery(profile,sut_name=gen_qaic_profile,model_name=retinanet,device=qaic,index_file=openimages_cal_images_list.txt) did not find anything, but there are tags: {'profile'} , trying to find a producer...
WARNING:root:[work_collection] A total of 1 matched rules found.

WARNING:root:Matched Rule #1/1: ['profile', 'device=qaic', 'model_name=retinanet'] from Entry 'profile_qaic_retinanet_recipe'...
WARNING:root:Pipeline: [['run']], Cumulative params: {'__query': 'profile,sut_name=gen_qaic_profile,model_name=retinanet,device=qaic,index_file=openimages_cal_images_list.txt', 'return_saved_record_entry': True, 'device': 'qaic', 'model_name': 'retinanet', 'sut_name': 'gen_qaic_profile', 'index_file': 'openimages_cal_images_list.txt', 'tags': ['profile']}
WARNING:root:[base_qaic_profile] touch _BEFORE_CODE_LOADING=/home/krai/work_collection/axs2qaic_mlperf_3.1/qaic_tool_parser
WARNING:root:[work_collection] byquery(sut_config,sut=gen_qaic_profile,model=retinanet,loadgen_scenario=Offline,device_id=all) did not find anything, but there are tags: {'sut_config'} , trying to find a producer...
WARNING:root:[work_collection] A total of 0 matched rules found.

------------------------------------------------------------------------------------------------------------------------
While computing nested_calls in ['^^', 'execute', [[['get', 'sut_entry'], ['get', 'config_compiletime_profile']]]] the following exception was raised: In pipeline [
    ['get', 'sut_entry']
    ['get', 'config_compiletime_profile']
] step ['get', 'config_compiletime_profile'] cannot be executed on value (None) produced by ['get', 'sut_entry']

The referred config file given here doesn't look right (for bert?)

We tried the old compiler params given here and this is giving the below output and no elf binary being produced.

$ /opt/qti-aic/exec/qaic-exec -model=`pwd`/retinanet.onnx -load-profile=/home/arjun/profile.yaml -aic-binary-dir=`pwd`/elfs -enable-channelwise -profiling-threads=8 -onnx-define-symbol=batch_size,1 -node-precision-info=`pwd`/node-precision.yaml -aic-enable-depth-first -aic-num-cores=1 -mos=1 -ols=1 -batchsize=1 -quantization-schema=asymmetric -quantization-calibration=None  -execute-nodes-in-fp16=Sigmoid

-quantization-schema is going to be deprecated in a future release, use -quantization-schema-activations and -quantization-schema-constants instead.
Reading ONNX Model from /home/arjun/CM/repos/local/cache/25f4253bb0c7433a/retinanet.onnx
Compile started ............... 
Compiling model with Int8 precision using PGQ.

Dev00 NSP_00 t5: network doorbell wait timeout exceeded (1 time(s)): db@0x400 waiting for 108 last got 98
Dev00 NSP_00 t5: last completed op HVX db@0x400 98
Dev00 NSP_00 t5: last completed op HMX db@0x404 97
Dev00 NSP_00 t5: last completed op DMAIssue db@0x408 107
Dev00 NSP_00 t5: last completed op HVX0DMAComplete db@0x40c 0
Dev00 NSP_00 t5: last completed op HVX1DMAComplete db@0x410 0
Dev00 NSP_00 t5: last completed op HVX2DMAComplete db@0x414 0
Dev00 NSP_00 t5: last completed op HVX3DMAComplete db@0x418 0
Dev00 NSP_00 t5: last completed op HMXDMAComplete db@0x41c 95
Dev00 NSP_00 t5: last completed op DMAComplete db@0x420 107
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0x0 0
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0x4 0
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0x8 0
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0xc 0
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0x10 0
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0x14 0
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0x18 0
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0x1c 0
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0x20 0
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0x24 0
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0x28 0
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0x2c 0
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0x30 0
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0x34 0
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0x38 0
Dev00 NSP_00 t5: last completed op inputsReadyForReadDB db@0x3c 0
Dev00 NSP_00 t5: last completed op outputsReadyForWriteDB db@0x40 1
Dev00 NSP_00 t5: last completed op outputsReadyForWriteDB db@0x44 1
Dev00 NSP_00 t5: last completed op outputsReadyForWriteDB db@0x48 1

psyhtest commented 10 months ago

@arjunsuresh Thank you for reporting the issues. I believe you are reproducing this on AWS DL2q instances with Qualcomm Cloud AI 100 (QAIC100) Standard accelerator cards?

To build the image krai/axs.qaic:deb_1.9.1.25, you'd need access to the QAIC100 Apps/Platform SDK v1.9.1.25. Hopefully, you can build krai/axs.qaic:deb_1.10.0.x.

The config file you referred to is for a machine with 2x Pro (16NSP core) cards, i.e. incompatible with the DL2q instance. Having said that, the contents is indeed incorrect. We'll look into updating the config file for HPE's DL385 Q8 Std machine, which is the closest to AWS DL2q instances, after the holidays.

arjunsuresh commented 10 months ago

Thank you @psyhtest for your reply. The docker error was on our side - we got passed it and could build the docker image. But the model compilation failed - error is updated in the issue. We are trying to compile the model on an x86 machine - not on AWS, to get the ELF binary to run on Thundercomm RB6. I think the config files are wrong even for ResNet50 but there we could manage with the old ones from ck-qaic.

psyhtest commented 10 months ago

@arjunsuresh Which SDK do you have for RB6?

We'll take a look at the assembled SUT configs. We normally generate them on-the-fly when running experiments. For the krai/axs2config repository, however, we regenerated them after collecting all the results, expecting them to be exactly the same. It sounds that it's not the case, unfortunately.

arjunsuresh commented 10 months ago

Thank you @psyhtest . I tested with 1.10.0.193 - would you recommend trying with 1.9.0? For compilation, I tried playing with the -submit-timeout option but that didn't help with the timeout error. The profile generation takes about a day - that's the challenge in changing the SDK.

arjunsuresh commented 10 months ago

Tried with 1.9.1.25 SDK.

/opt/qti-aic/exec/qaic-exec -model=/home/arjun/CM/repos/local/cache/853549596cd84450/retinanet.onnx -load-profile=/home/arjun/CM/repos/local/cache/c3b78c3b70554bf3/profile.yaml -aic-binary-dir=/home/arjun/CM/repos/local/cache/b972b152d55a4739/elfs -enable-channelwise -onnx-define-symbol=batch_size,1 -node-precision-info=/home/arjun/CM/repos/local/cache/853549596cd84450/node-precision-info.yaml -quantization-schema=asymmetric -quantization-calibration=None  -execute-nodes-in-fp16=Sigmoid -aic-enable-depth-first -aic-num-cores=1 -mos=1 -ols=1
-quantization-schema is going to be deprecated in a future release, use -quantization-schema-activations and -quantization-schema-constants instead.
Reading ONNX Model from /home/arjun/CM/repos/local/cache/853549596cd84450/retinanet.onnx
Compile started ............... 
Compiling model with Int8 precision using PGQ.
Iter[0/0]: model execution took 2295014 ms

No timeout error now. But the elf binary folder is missing - it exists during the compilation time but vanishes when the compilation ends.

psyhtest commented 10 months ago

@arjunsuresh We only used SDK v1.9.1.25 in the previous round, so I'm not sure how v1.10.x.y would fare.

Here's the Offline compilation command currently used by the official axs automation workflow:

/opt/qti-aic/exec/qaic-exec -m=/home/krai/work_collection/downloaded_retinanet.onnx/retinanet.onnx \
-aic-binary-dir=/home/krai/work_collection/model_qaic_retinanet_Offline/./elfs \
-aic-num-cores=1 -ols=1 -mos=1 -batchsize=1 -aic-enable-depth-first \
-aic-hw -aic-hw-version=2.0 -compile-only -onnx-define-symbol=batch_size,1 \
-quantization-schema-constants=symmetric_with_uint8 \
-quantization-schema-activations=asymmetric \
-quantization-calibration=None -enable-channelwise \
-node-precision-info=/home/krai/work_collection/axs2qaic-dev_main/node_precision_info_retinanet/node-precision.yaml \
-load-profile=/home/krai/work_collection/profile_qaic_retinanet_openimages_cal_images_list.txt_bs.1/profile.yaml

It should be the same for v3.1.

psyhtest commented 10 months ago

Also, please note that, starting from SDK v1.11.u.v (and probably v1.10.x.y), you need to define some constant indices differently:

#ifdef SDK_1_11_X
  const int CLASSES_INDEX = 5;
  const int BOXES_INDEX = 10;
  const int TOPK_INDEX = 0;
#else
  const int CLASSES_INDEX = 0;
  const int BOXES_INDEX = 5;
  const int TOPK_INDEX = 10;
#endif

arjunsuresh commented 10 months ago

Thanks a lot @psyhtest especially for the NMS changes. My bad - -aic-hw is the flag I had missed and instead of compiling to exe I believe qaic-exec was doing a simulation run.

Feedback to qaic team: a verbose output can be given by qaic-exec about what it is doing (simulation or compilation). The current outputs are the same.

Reading ONNX Model from /home/arjun/CM/repos/local/cache/baeff5e7bd24491c/retinanet.onnx
Compile started ............... 
Compiling model with Int8 precision using PGQ.

mlcommons / inference_results_v3.1

[Qualcomm] How to know the compilation options used for RetinaNet? #13