Closed arjunsuresh closed 10 months ago
@arjunsuresh Thank you for reporting the issues. I believe you are reproducing this on AWS DL2q instances with Qualcomm Cloud AI 100 (QAIC100) Standard accelerator cards?
To build the image krai/axs.qaic:deb_1.9.1.25
, you'd need access to the QAIC100 Apps/Platform SDK v1.9.1.25. Hopefully, you can build krai/axs.qaic:deb_1.10.0.x
.
The config file you referred to is for a machine with 2x Pro (16NSP core) cards, i.e. incompatible with the DL2q instance. Having said that, the contents is indeed incorrect. We'll look into updating the config file for HPE's DL385 Q8 Std machine, which is the closest to AWS DL2q instances, after the holidays.
Thank you @psyhtest for your reply. The docker error was on our side - we got passed it and could build the docker image. But the model compilation failed - error is updated in the issue. We are trying to compile the model on an x86 machine - not on AWS, to get the ELF binary to run on Thundercomm RB6. I think the config files are wrong even for ResNet50 but there we could manage with the old ones from ck-qaic.
@arjunsuresh Which SDK do you have for RB6?
We'll take a look at the assembled SUT configs. We normally generate them on-the-fly when running experiments. For the krai/axs2config repository, however, we regenerated them after collecting all the results, expecting them to be exactly the same. It sounds that it's not the case, unfortunately.
Thank you @psyhtest . I tested with 1.10.0.193 - would you recommend trying with 1.9.0? For compilation, I tried playing with the -submit-timeout
option but that didn't help with the timeout error. The profile generation takes about a day - that's the challenge in changing the SDK.
Tried with 1.9.1.25 SDK.
/opt/qti-aic/exec/qaic-exec -model=/home/arjun/CM/repos/local/cache/853549596cd84450/retinanet.onnx -load-profile=/home/arjun/CM/repos/local/cache/c3b78c3b70554bf3/profile.yaml -aic-binary-dir=/home/arjun/CM/repos/local/cache/b972b152d55a4739/elfs -enable-channelwise -onnx-define-symbol=batch_size,1 -node-precision-info=/home/arjun/CM/repos/local/cache/853549596cd84450/node-precision-info.yaml -quantization-schema=asymmetric -quantization-calibration=None -execute-nodes-in-fp16=Sigmoid -aic-enable-depth-first -aic-num-cores=1 -mos=1 -ols=1
-quantization-schema is going to be deprecated in a future release, use -quantization-schema-activations and -quantization-schema-constants instead.
Reading ONNX Model from /home/arjun/CM/repos/local/cache/853549596cd84450/retinanet.onnx
Compile started ...............
Compiling model with Int8 precision using PGQ.
Iter[0/0]: model execution took 2295014 ms
No timeout error now. But the elf
binary folder is missing - it exists during the compilation time but vanishes when the compilation ends.
@arjunsuresh We only used SDK v1.9.1.25 in the previous round, so I'm not sure how v1.10.x.y would fare.
Here's the Offline compilation command currently used by the official axs automation workflow:
/opt/qti-aic/exec/qaic-exec -m=/home/krai/work_collection/downloaded_retinanet.onnx/retinanet.onnx \
-aic-binary-dir=/home/krai/work_collection/model_qaic_retinanet_Offline/./elfs \
-aic-num-cores=1 -ols=1 -mos=1 -batchsize=1 -aic-enable-depth-first \
-aic-hw -aic-hw-version=2.0 -compile-only -onnx-define-symbol=batch_size,1 \
-quantization-schema-constants=symmetric_with_uint8 \
-quantization-schema-activations=asymmetric \
-quantization-calibration=None -enable-channelwise \
-node-precision-info=/home/krai/work_collection/axs2qaic-dev_main/node_precision_info_retinanet/node-precision.yaml \
-load-profile=/home/krai/work_collection/profile_qaic_retinanet_openimages_cal_images_list.txt_bs.1/profile.yaml
It should be the same for v3.1.
Also, please note that, starting from SDK v1.11.u.v (and probably v1.10.x.y), you need to define some constant indices differently:
#ifdef SDK_1_11_X
const int CLASSES_INDEX = 5;
const int BOXES_INDEX = 10;
const int TOPK_INDEX = 0;
#else
const int CLASSES_INDEX = 0;
const int BOXES_INDEX = 5;
const int TOPK_INDEX = 10;
#endif
Thanks a lot @psyhtest especially for the NMS changes. My bad - -aic-hw
is the flag I had missed and instead of compiling to exe I believe qaic-exec
was doing a simulation run.
Feedback to qaic
team: a verbose output can be given by qaic-exec
about what it is doing (simulation or compilation). The current outputs are the same.
Reading ONNX Model from /home/arjun/CM/repos/local/cache/baeff5e7bd24491c/retinanet.onnx
Compile started ...............
Compiling model with Int8 precision using PGQ.
When we try to reproduce the instructions given here we get the below error
The referred config file given here doesn't look right (for bert?)
We tried the old compiler params given here and this is giving the below output and no elf binary being produced.