Closed hieunm1821 closed 2 months ago
Follow steps in "Build and run with Android Studio" in introduction . Binaries need to be placed in proper directory, check steps in YoloNas/resolveDependencies.sh script.
Thanks for your reply Setup is correct because I can run the YoloNAS coco pre-trained weights but the weight trained on my own dataset not work
2024-07-17 17:37:21.020 1539-1848 System.out com.qc.objectdetectionYoloNas I mNetworkLoaded: true runtime_var: D
2024-07-17 17:37:21.023 1539-2249 SNPE_INF com.qc.objectdetectionYoloNas I runtime arg D
2024-07-17 17:37:21.023 1539-2249 SNPE_INF com.qc.objectdetectionYoloNas I Added DSP
2024-07-17 17:37:21.023 1539-2249 SNPE_INF com.qc.objectdetectionYoloNas I SNPE Version = 2.19.0.240124133650_81096
2024-07-17 17:37:21.025 1539-1848 System.out com.qc.objectdetectionYoloNas I calling inference
2024-07-17 17:37:21.027 1539-1848 SNPE_INF com.qc.objectdetectionYoloNas I infer SNPE S
2024-07-17 17:37:21.027 1539-1848 SNPE_INF com.qc.objectdetectionYoloNas I execute_net_BB
2024-07-17 17:37:21.042 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I Exec BB status is true
2024-07-17 17:37:21.043 1539-2249 SNPE_INF com.qc.objectdetectionYoloNas I =========> platformconfig set: useAdaptivePD:ON
2024-07-17 17:37:21.043 1539-2249 SNPE_INF com.qc.objectdetectionYoloNas I =========> platformconfig option: valid
2024-07-17 17:37:21.043 1539-2249 SNPE_INF com.qc.objectdetectionYoloNas I runtime sh dsp_fixed8_tf
2024-07-17 17:37:21.043 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I No object detected
2024-07-17 17:37:21.043 1539-1668 System.out com.qc.objectdetectionYoloNas I mNetworkLoaded: true runtime_var: D
2024-07-17 17:37:21.047 1539-1668 System.out com.qc.objectdetectionYoloNas I calling inference
2024-07-17 17:37:21.047 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I infer SNPE S
2024-07-17 17:37:21.047 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I execute_net_BB
2024-07-17 17:37:21.055 1539-2316 com.qc.obj...ionYoloNas com.qc.objectdetectionYoloNas I vendor/qcom/proprietary/adsprpc/src/fastrpc_latency.c:122: fastrpc_latency_thread_handler started for QoS with activity window 100 ms
2024-07-17 17:37:21.055 1539-2311 com.qc.obj...ionYoloNas com.qc.objectdetectionYoloNas I vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:2213: remote_handle_control_domain: requested QOS 1, latency 100 for domain 3 handle 0xb4000079511ee6a0
2024-07-17 17:37:21.055 1539-2311 com.qc.obj...ionYoloNas com.qc.objectdetectionYoloNas I vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:1746: manage_poll_qos: poll mode updated to 3 for domain 3, handle 0xb4000079511ee6a0 for timeout 9999
2024-07-17 17:37:21.055 1539-2311 com.qc.obj...ionYoloNas com.qc.objectdetectionYoloNas I vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:2213: remote_handle_control_domain: requested QOS 3, latency 9999 for domain 3 handle 0xb4000079511ee6a0
2024-07-17 17:37:21.079 1539-2249 SNPE_INF com.qc.objectdetectionYoloNas I Creating Input Buffer = input.1
2024-07-17 17:37:21.079 1539-1848 SNPE_INF com.qc.objectdetectionYoloNas I Preprocessing and loading in application Input Buffer for BB
2024-07-17 17:37:21.079 1539-1848 SNPE_INF com.qc.objectdetectionYoloNas I Filling input.1 buffer
2024-07-17 17:37:21.079 1539-2249 SNPE_INF com.qc.objectdetectionYoloNas I BufferName: input.1 Bufsize: 1228800
2024-07-17 17:37:21.080 1539-1848 SNPE_INF com.qc.objectdetectionYoloNas I num of channels: 3
2024-07-17 17:37:21.083 1539-1848 SNPE_INF com.qc.objectdetectionYoloNas I Exec BB status is true
2024-07-17 17:37:21.083 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I Preprocessing and loading in application Input Buffer for BB
2024-07-17 17:37:21.083 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I Filling input.1 buffer
2024-07-17 17:37:21.084 1539-1848 SNPE_INF com.qc.objectdetectionYoloNas I No object detected
2024-07-17 17:37:21.084 1539-1848 System.out com.qc.objectdetectionYoloNas I mNetworkLoaded: true runtime_var: D
2024-07-17 17:37:21.084 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I num of channels: 3
2024-07-17 17:37:21.086 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I Exec BB status is true
2024-07-17 17:37:21.087 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I No object detected
2024-07-17 17:37:21.087 1539-1668 System.out com.qc.objectdetectionYoloNas I mNetworkLoaded: true runtime_var: D
2024-07-17 17:37:21.088 1539-1848 System.out com.qc.objectdetectionYoloNas I calling inference
2024-07-17 17:37:21.090 1539-1848 SNPE_INF com.qc.objectdetectionYoloNas I infer SNPE S
2024-07-17 17:37:21.090 1539-1848 SNPE_INF com.qc.objectdetectionYoloNas I execute_net_BB
2024-07-17 17:37:21.090 1539-1848 SNPE_INF com.qc.objectdetectionYoloNas I Preprocessing and loading in application Input Buffer for BB
2024-07-17 17:37:21.090 1539-1848 SNPE_INF com.qc.objectdetectionYoloNas I Filling input.1 buffer
2024-07-17 17:37:21.091 1539-1848 SNPE_INF com.qc.objectdetectionYoloNas I num of channels: 3
2024-07-17 17:37:21.091 1539-1668 System.out com.qc.objectdetectionYoloNas I calling inference
2024-07-17 17:37:21.093 1539-1848 SNPE_INF com.qc.objectdetectionYoloNas I Exec BB status is true
2024-07-17 17:37:21.093 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I infer SNPE S
2024-07-17 17:37:21.093 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I execute_net_BB
2024-07-17 17:37:21.094 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I Preprocessing and loading in application Input Buffer for BB
2024-07-17 17:37:21.094 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I Filling input.1 buffer
2024-07-17 17:37:21.094 1539-1848 SNPE_INF com.qc.objectdetectionYoloNas I No object detected
2024-07-17 17:37:21.095 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I num of channels: 3
2024-07-17 17:37:21.096 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I Exec BB status is true
2024-07-17 17:37:21.097 1539-1668 SNPE_INF com.qc.objectdetectionYoloNas I No object detected
I think the issues are quantization steps, or I need to swap axis of input or output before post-processing. Any suggestions?
In 2nd point as mentioned, with pre-trained weights, are you able to generate all DLC files successfully? Share the step details with output info. My understanding output names of latest model is different from details mentioned in Generate_DLC file. Please check with output names using snpe-dlc-info.
2.
snpe-onnx-to-dlc -i yolo_nas_s.onnx -o yolo_nas_s.dlc
2024-07-18 14:26:32,271 - 235 - INFO - Simplified model validation is successful
2024-07-18 14:26:34,849 - 235 - INFO - INFO_INITIALIZATION_SUCCESS:
2024-07-18 14:26:34,965 - 235 - INFO - INFO_CONVERSION_SUCCESS: Conversion completed successfully
2024-07-18 14:26:35,188 - 235 - INFO - INFO_WRITE_SUCCESS:
snpe-dlc-quantize --input_dlc yolo_nas_s.dlc --input_list YoloInputlist.txt --use_enhanced_quantizer --use_adjusted_weights_quantizer --axis_quant --output_dlc Quant_intermediate_yoloNas_s_320.dlc
[INFO] InitializeStderr: DebugLog initialized.
[WARNING] --axis_quant is deprecated, use --use_per_channel_quantization option.
[WARNING] --use_enhanced_quantizer option is deprecated, use --param_quantizer and --act_quantizer options.
[WARNING] --use_adjusted_weights_quantizer option is deprecated, use --param_quantizer option.
[INFO] Processed command-line arguments
IrQuantizer: Quantizer param type: adjusted will be deprecated in future releases
IrQuantizer: Quantizer type: adjusted is no longer supported. Using TF quantizer instead
[INFO] Quantized parameters
0.1ms [ INFO ] Inferences will run in sync mode
0.2ms [ INFO ] Initializing logging in the backend. Callback: [0xd18d80], Log Level: [3]
0.2ms [ INFO ] No BackendExtensions lib provided;initializing NetRunBackend Interface
0.3ms [WARNING] Unable to find a device with NetRunDeviceKeyDefault in Library NetRunBackendLibKeyDefault
0.3ms [ INFO ] Entering QuantizeRuntimeApp flow
39.8ms [ INFO ] CpuGraph::finalize
167.6ms [ INFO ] CpuGraph::execute
1020.8ms [ INFO ] cleaning up resources for input tensors
1021.0ms [ INFO ] cleaning up resources for output tensors
1028.6ms [ INFO ] CpuGraph::execute
1679.1ms [ INFO ] cleaning up resources for input tensors
1679.2ms [ INFO ] cleaning up resources for output tensors
1686.3ms [ INFO ] CpuGraph::execute
2344.1ms [ INFO ] cleaning up resources for input tensors
2344.1ms [ INFO ] cleaning up resources for output tensors
2351.2ms [ INFO ] CpuGraph::execute
3011.1ms [ INFO ] cleaning up resources for input tensors
3011.2ms [ INFO ] cleaning up resources for output tensors
3018.9ms [ INFO ] CpuGraph::execute
3677.7ms [ INFO ] cleaning up resources for input tensors
3677.7ms [ INFO ] cleaning up resources for output tensors
3684.9ms [ INFO ] CpuGraph::execute
4336.1ms [ INFO ] cleaning up resources for input tensors
4336.2ms [ INFO ] cleaning up resources for output tensors
4343.3ms [ INFO ] CpuGraph::execute
4998.6ms [ INFO ] cleaning up resources for input tensors
4998.6ms [ INFO ] cleaning up resources for output tensors
5005.7ms [ INFO ] CpuGraph::execute
5661.5ms [ INFO ] cleaning up resources for input tensors
5661.6ms [ INFO ] cleaning up resources for output tensors
5666.4ms [ INFO ] CpuGraph::execute
6313.4ms [ INFO ] cleaning up resources for input tensors
6313.4ms [ INFO ] cleaning up resources for output tensors
6318.5ms [ INFO ] CpuGraph::execute
6966.8ms [ INFO ] cleaning up resources for input tensors
6966.8ms [ INFO ] cleaning up resources for output tensors
[INFO] Generated activations
7955.7ms [ INFO ] Freeing graphsInfo
[INFO] Saved quantized dlc to: Quant_intermediate_yoloNas_s_320.dlc
[INFO] DebugLog shutting down.
snpe-dlc-graph-prepare --input_dlc Quant_intermediate_yoloNas_s_320.dlc --set_output_tensors 885,893 --output_dlc Quant_yoloNas_s_320.dlc --htp_socs sm8550
[INFO] InitializeStderr: DebugLog initialized.
[WARNING] Input[0] has Datatype 0x408.
[INFO] SNPE HTP Offline Prepare: Attempting to create cache for SM8550
[USER_INFO] Target device backend record identifier: HTP_V73_SM8550_8MB
[USER_INFO] No cache record in the DLC matches the target device (HTP_V73_SM8550_8MB). Creating a new record
[INFO] Attempting to open dynamically linked lib: libHtpPrepare.so
[INFO] dlopen libHtpPrepare.so SUCCESS handle 0x126c300
[INFO] Found Interface Provider (v2.13)
[USER_WARNING] QnnDsp <W> Initializing HtpProvider
[USER_WARNING] QnnDsp <W> HTP arch will be deprecated, please set SoC id instead.
[USER_WARNING] QnnDsp <W> Performance Estimates unsupported
[USER_INFO] Platform option not set
[USER_INFO] Offline Prepare VTCM size(MB) selected = 8
[USER_INFO] Offline Prepare Optimization Level passed = 2
[USER_WARNING] QnnDsp <W> Output padding param cannot be set explicitly. Skipping param
[USER_WARNING] QnnDsp <W> Output padding param cannot be set explicitly. Skipping param
[USER_INFO] Backend Mgr ~Dtor called for backend HTP
[USER_INFO] Cleaning up Context handle:0x1
[USER_INFO] BackendTerminate triggered
[INFO] SNPE HTP Offline Prepare: Successfully created cache for SM8550
[INFO] ======== Run Summary ========
[INFO] SM8550 : Success
[USER_INFO] BackendTerminate triggered
[INFO] DebugLog shutting down.
CPU GPU DSP
Did you make the output names modification in android app file i.e inference.cpp file and executeDLC module.
Yes. I thought that is necessary for CPU and GPU to work.
Can you share snippet of modification in inference.cpp file.
I changed name_out_boxes
and name_out_classes
and also 1 class instead of 80 classes
bool executeDLC(cv::Mat &img, int orig_width, int orig_height, int &numberofobj, std::vector<std::vector<float>> &BB_coords, std::vector<std::string> &BB_names) {
LOGI("execute_net_BB");
ATrace_beginSection("preprocessing");
struct timeval start_time, end_time;
float milli_time, seconds, useconds;
mtx.lock();
assert(snpe_BB!=nullptr);
if(!loadInputUserBuffer_BB(applicationInputBuffers, snpe_BB, img, inputMap, bitWidth))
{
LOGE("Failed to load Input UserBuffer");
mtx.unlock();
return false;
}
std::string name_out_boxes = "893";
std::string name_out_classes = "885";
ATrace_endSection();
gettimeofday(&start_time, NULL);
ATrace_beginSection("inference time");
bool execStatus = snpe_BB->execute(inputMap, outputMap);
ATrace_endSection();
ATrace_beginSection("postprocessing time");
gettimeofday(&end_time, NULL);
seconds = end_time.tv_sec - start_time.tv_sec; //seconds
useconds = end_time.tv_usec - start_time.tv_usec; //milliseconds
milli_time = ((seconds) * 1000 + useconds/1000.0);
//LOGI("Inference time %f ms", milli_time);
if(execStatus== true){
LOGI("Exec BB status is true");
}
else{
LOGE("Exec BB status is false");
mtx.unlock();
return false;
}
std::vector<float32_t> BBout_boxcoords = applicationOutputBuffers.at(name_out_boxes);
std::vector<float32_t> BBout_class = applicationOutputBuffers.at(name_out_classes);
std::vector<BoxCornerEncoding> Boxlist;
std::vector<std::string> Classlist;
//Post Processing
for(int i =0;i<(2100);i++) //TODO change value of 2100 to soft value
{
int start = i*1;
int end = (i+1)*1;
auto it = max_element (BBout_class.begin()+start, BBout_class.begin()+end);
int index = distance(BBout_class.begin()+start, it);
std::string classname = classnamemapping[index];
if(*it>=0.3 )
{
int x1 = BBout_boxcoords[i * 4 + 0];
int y1 = BBout_boxcoords[i * 4 + 1];
int x2 = BBout_boxcoords[i * 4 + 2];
int y2 = BBout_boxcoords[i * 4 + 3];
Boxlist.push_back(BoxCornerEncoding(x1, y1, x2, y2,*it,classname));
}
}
//LOGI("Boxlist size:: %d",Boxlist.size());
std::vector<BoxCornerEncoding> reslist = NonMaxSuppression(Boxlist,0.20);
//LOGI("reslist ssize %d", reslist.size());
numberofobj = reslist.size();
float ratio_2 = orig_width/320.0f;
float ratio_1 = orig_height/320.0f;
//LOGI("ratio1 %f :: ratio_2 %f",ratio_1,ratio_2);
for(int k=0;k<numberofobj;k++) {
float top,bottom,left,right;
left = reslist[k].y1 * ratio_1; //y1
right = reslist[k].y2 * ratio_1; //y2
bottom = reslist[k].x1 * ratio_2; //x1
top = reslist[k].x2 * ratio_2; //x2
//LOGI("Coords:: x1:%d :: y1:%d :: x2:%d :: y2:%d",reslist[0].x1,reslist[0].y1,reslist[0].x2,reslist[0].y2);
//LOGI("after mul: %f %f %f %f",bottom, left, top, right );
std::vector<float> singleboxcoords{top, bottom, left, right, milli_time};
BB_coords.push_back(singleboxcoords);
BB_names.push_back(reslist[k].objlabel);
}
ATrace_endSection();
mtx.unlock();
return true;
}
std::string name_out_boxes = "885"
std::string name_out_classes = "893"
Can you try with above combination and update.
Not matched. I think
java_vm_ext.cc:591] JNI DETECTED ERROR IN APPLICATION: JNI NewStringUTF called with pending exception java.lang.ArrayIndexOutOfBoundsException: length=100; index=100
Based on this, I think 893 is for boxes and 885 is for classes
Other possibility on this issue seems to be related to precision on DSP. You can refer mixed-precision logic and enhance the model.
Hi @hieunm1821 - Is this issue fixed, Pls let us know. We will prioritize support if this is not fixed yet.
Hello @hieunm1821 - Is this issue fixed, Pls let us know. We will prioritize support if this is not fixed yet.
Hi. Thanks for the following up. I think it fixed. The reason is because loss of accuracy when quantizing
Thanks for the update, Good to see developers using our applications to deploy custom trained models.
I found a quick way to fix is to remove these flag when use snpe-dlc-quantize
--use_enhanced_quantizer --use_adjusted_weights_quantizer --axis_quant
Follow the introduction with my own dataset. CPU and GPU show result but DSP cannot. No logs just saying no box/ detections.