quic / qidk

Other
105 stars 23 forks source link

Running YoloNAS on Snapdragon 8 Gen 2 SM8550 on DSP #28

Closed hieunm1821 closed 2 months ago

hieunm1821 commented 4 months ago

Follow the introduction with my own dataset. CPU and GPU show result but DSP cannot. No logs just saying no box/ detections.

quic-vraidu commented 4 months ago

Follow steps in "Build and run with Android Studio" in introduction . Binaries need to be placed in proper directory, check steps in YoloNas/resolveDependencies.sh script.

hieunm1821 commented 4 months ago

Thanks for your reply Setup is correct because I can run the YoloNAS coco pre-trained weights but the weight trained on my own dataset not work

quic-vraidu commented 4 months ago
  1. Can you share details of Qualcomm device used?
  2. With pre-trained weights, did you generate DLC files and able to run model on the Qualcomm DSP using Qualcomm device?
  3. With custom weights, did you generate DLC files successfully without errors?
  4. Did you place DLC file in correct folder path? Share path details.
  5. Do you observe errors from logcat with custom weights DLC file? Share logcat details.
hieunm1821 commented 4 months ago
  1. I used Galaxy S23 Ultra / Snapdragon 8 Gen 2/ SM8550
  2. Model works properly with runtime CPU and GPU. Showing detections and draw box. DSP just said no detections at all
  3. Logcat
    2024-07-17 17:37:21.020  1539-1848  System.out              com.qc.objectdetectionYoloNas        I  mNetworkLoaded: true runtime_var: D
    2024-07-17 17:37:21.023  1539-2249  SNPE_INF                com.qc.objectdetectionYoloNas        I  runtime arg D
    2024-07-17 17:37:21.023  1539-2249  SNPE_INF                com.qc.objectdetectionYoloNas        I  Added DSP
    2024-07-17 17:37:21.023  1539-2249  SNPE_INF                com.qc.objectdetectionYoloNas        I  SNPE Version = 2.19.0.240124133650_81096
    2024-07-17 17:37:21.025  1539-1848  System.out              com.qc.objectdetectionYoloNas        I  calling inference
    2024-07-17 17:37:21.027  1539-1848  SNPE_INF                com.qc.objectdetectionYoloNas        I  infer SNPE S
    2024-07-17 17:37:21.027  1539-1848  SNPE_INF                com.qc.objectdetectionYoloNas        I  execute_net_BB
    2024-07-17 17:37:21.042  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  Exec BB status is true
    2024-07-17 17:37:21.043  1539-2249  SNPE_INF                com.qc.objectdetectionYoloNas        I  =========> platformconfig set: useAdaptivePD:ON
    2024-07-17 17:37:21.043  1539-2249  SNPE_INF                com.qc.objectdetectionYoloNas        I  =========> platformconfig option: valid
    2024-07-17 17:37:21.043  1539-2249  SNPE_INF                com.qc.objectdetectionYoloNas        I  runtime sh dsp_fixed8_tf
    2024-07-17 17:37:21.043  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  No object detected
    2024-07-17 17:37:21.043  1539-1668  System.out              com.qc.objectdetectionYoloNas        I  mNetworkLoaded: true runtime_var: D
    2024-07-17 17:37:21.047  1539-1668  System.out              com.qc.objectdetectionYoloNas        I  calling inference
    2024-07-17 17:37:21.047  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  infer SNPE S
    2024-07-17 17:37:21.047  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  execute_net_BB
    2024-07-17 17:37:21.055  1539-2316  com.qc.obj...ionYoloNas com.qc.objectdetectionYoloNas        I  vendor/qcom/proprietary/adsprpc/src/fastrpc_latency.c:122: fastrpc_latency_thread_handler started for QoS with activity window 100 ms
    2024-07-17 17:37:21.055  1539-2311  com.qc.obj...ionYoloNas com.qc.objectdetectionYoloNas        I  vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:2213: remote_handle_control_domain: requested QOS 1, latency 100 for domain 3 handle 0xb4000079511ee6a0
    2024-07-17 17:37:21.055  1539-2311  com.qc.obj...ionYoloNas com.qc.objectdetectionYoloNas        I  vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:1746: manage_poll_qos: poll mode updated to 3 for domain 3, handle 0xb4000079511ee6a0 for timeout 9999
    2024-07-17 17:37:21.055  1539-2311  com.qc.obj...ionYoloNas com.qc.objectdetectionYoloNas        I  vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:2213: remote_handle_control_domain: requested QOS 3, latency 9999 for domain 3 handle 0xb4000079511ee6a0
    2024-07-17 17:37:21.079  1539-2249  SNPE_INF                com.qc.objectdetectionYoloNas        I  Creating Input Buffer = input.1
    2024-07-17 17:37:21.079  1539-1848  SNPE_INF                com.qc.objectdetectionYoloNas        I  Preprocessing and loading in application Input Buffer for BB
    2024-07-17 17:37:21.079  1539-1848  SNPE_INF                com.qc.objectdetectionYoloNas        I  Filling input.1 buffer 
    2024-07-17 17:37:21.079  1539-2249  SNPE_INF                com.qc.objectdetectionYoloNas        I   BufferName: input.1 Bufsize: 1228800
    2024-07-17 17:37:21.080  1539-1848  SNPE_INF                com.qc.objectdetectionYoloNas        I  num of channels: 3
    2024-07-17 17:37:21.083  1539-1848  SNPE_INF                com.qc.objectdetectionYoloNas        I  Exec BB status is true
    2024-07-17 17:37:21.083  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  Preprocessing and loading in application Input Buffer for BB
    2024-07-17 17:37:21.083  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  Filling input.1 buffer 
    2024-07-17 17:37:21.084  1539-1848  SNPE_INF                com.qc.objectdetectionYoloNas        I  No object detected
    2024-07-17 17:37:21.084  1539-1848  System.out              com.qc.objectdetectionYoloNas        I  mNetworkLoaded: true runtime_var: D
    2024-07-17 17:37:21.084  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  num of channels: 3
    2024-07-17 17:37:21.086  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  Exec BB status is true
    2024-07-17 17:37:21.087  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  No object detected
    2024-07-17 17:37:21.087  1539-1668  System.out              com.qc.objectdetectionYoloNas        I  mNetworkLoaded: true runtime_var: D
    2024-07-17 17:37:21.088  1539-1848  System.out              com.qc.objectdetectionYoloNas        I  calling inference
    2024-07-17 17:37:21.090  1539-1848  SNPE_INF                com.qc.objectdetectionYoloNas        I  infer SNPE S
    2024-07-17 17:37:21.090  1539-1848  SNPE_INF                com.qc.objectdetectionYoloNas        I  execute_net_BB
    2024-07-17 17:37:21.090  1539-1848  SNPE_INF                com.qc.objectdetectionYoloNas        I  Preprocessing and loading in application Input Buffer for BB
    2024-07-17 17:37:21.090  1539-1848  SNPE_INF                com.qc.objectdetectionYoloNas        I  Filling input.1 buffer 
    2024-07-17 17:37:21.091  1539-1848  SNPE_INF                com.qc.objectdetectionYoloNas        I  num of channels: 3
    2024-07-17 17:37:21.091  1539-1668  System.out              com.qc.objectdetectionYoloNas        I  calling inference
    2024-07-17 17:37:21.093  1539-1848  SNPE_INF                com.qc.objectdetectionYoloNas        I  Exec BB status is true
    2024-07-17 17:37:21.093  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  infer SNPE S
    2024-07-17 17:37:21.093  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  execute_net_BB
    2024-07-17 17:37:21.094  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  Preprocessing and loading in application Input Buffer for BB
    2024-07-17 17:37:21.094  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  Filling input.1 buffer 
    2024-07-17 17:37:21.094  1539-1848  SNPE_INF                com.qc.objectdetectionYoloNas        I  No object detected
    2024-07-17 17:37:21.095  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  num of channels: 3
    2024-07-17 17:37:21.096  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  Exec BB status is true
    2024-07-17 17:37:21.097  1539-1668  SNPE_INF                com.qc.objectdetectionYoloNas        I  No object detected

I think the issues are quantization steps, or I need to swap axis of input or output before post-processing. Any suggestions?

quic-vraidu commented 4 months ago

In 2nd point as mentioned, with pre-trained weights, are you able to generate all DLC files successfully? Share the step details with output info. My understanding output names of latest model is different from details mentioned in Generate_DLC file. Please check with output names using snpe-dlc-info.

hieunm1821 commented 4 months ago

2. snpe-onnx-to-dlc -i yolo_nas_s.onnx -o yolo_nas_s.dlc

2024-07-18 14:26:32,271 - 235 - INFO - Simplified model validation is successful
2024-07-18 14:26:34,849 - 235 - INFO - INFO_INITIALIZATION_SUCCESS: 
2024-07-18 14:26:34,965 - 235 - INFO - INFO_CONVERSION_SUCCESS: Conversion completed successfully
2024-07-18 14:26:35,188 - 235 - INFO - INFO_WRITE_SUCCESS:

snpe-dlc-quantize --input_dlc yolo_nas_s.dlc --input_list YoloInputlist.txt --use_enhanced_quantizer --use_adjusted_weights_quantizer --axis_quant --output_dlc Quant_intermediate_yoloNas_s_320.dlc

[INFO] InitializeStderr: DebugLog initialized.
[WARNING] --axis_quant is deprecated, use --use_per_channel_quantization option.
[WARNING] --use_enhanced_quantizer option is deprecated, use --param_quantizer and --act_quantizer options.
[WARNING] --use_adjusted_weights_quantizer option is deprecated, use --param_quantizer option.
[INFO] Processed command-line arguments
IrQuantizer: Quantizer param type: adjusted will be deprecated in future releases
IrQuantizer: Quantizer type: adjusted is no longer supported. Using TF quantizer instead
[INFO] Quantized parameters
     0.1ms [  INFO ] Inferences will run in sync mode
     0.2ms [  INFO ] Initializing logging in the backend. Callback: [0xd18d80], Log Level: [3]
     0.2ms [  INFO ] No BackendExtensions lib provided;initializing NetRunBackend Interface
     0.3ms [WARNING] Unable to find a device with NetRunDeviceKeyDefault in Library NetRunBackendLibKeyDefault
     0.3ms [  INFO ] Entering QuantizeRuntimeApp flow
    39.8ms [  INFO ] CpuGraph::finalize
   167.6ms [  INFO ] CpuGraph::execute
  1020.8ms [  INFO ] cleaning up resources for input tensors
  1021.0ms [  INFO ] cleaning up resources for output tensors
  1028.6ms [  INFO ] CpuGraph::execute
  1679.1ms [  INFO ] cleaning up resources for input tensors
  1679.2ms [  INFO ] cleaning up resources for output tensors
  1686.3ms [  INFO ] CpuGraph::execute
  2344.1ms [  INFO ] cleaning up resources for input tensors
  2344.1ms [  INFO ] cleaning up resources for output tensors
  2351.2ms [  INFO ] CpuGraph::execute
  3011.1ms [  INFO ] cleaning up resources for input tensors
  3011.2ms [  INFO ] cleaning up resources for output tensors
  3018.9ms [  INFO ] CpuGraph::execute
  3677.7ms [  INFO ] cleaning up resources for input tensors
  3677.7ms [  INFO ] cleaning up resources for output tensors
  3684.9ms [  INFO ] CpuGraph::execute
  4336.1ms [  INFO ] cleaning up resources for input tensors
  4336.2ms [  INFO ] cleaning up resources for output tensors
  4343.3ms [  INFO ] CpuGraph::execute
  4998.6ms [  INFO ] cleaning up resources for input tensors
  4998.6ms [  INFO ] cleaning up resources for output tensors
  5005.7ms [  INFO ] CpuGraph::execute
  5661.5ms [  INFO ] cleaning up resources for input tensors
  5661.6ms [  INFO ] cleaning up resources for output tensors
  5666.4ms [  INFO ] CpuGraph::execute
  6313.4ms [  INFO ] cleaning up resources for input tensors
  6313.4ms [  INFO ] cleaning up resources for output tensors
  6318.5ms [  INFO ] CpuGraph::execute
  6966.8ms [  INFO ] cleaning up resources for input tensors
  6966.8ms [  INFO ] cleaning up resources for output tensors
[INFO] Generated activations
  7955.7ms [  INFO ] Freeing graphsInfo
[INFO] Saved quantized dlc to: Quant_intermediate_yoloNas_s_320.dlc
[INFO] DebugLog shutting down.

snpe-dlc-graph-prepare --input_dlc Quant_intermediate_yoloNas_s_320.dlc --set_output_tensors 885,893 --output_dlc Quant_yoloNas_s_320.dlc --htp_socs sm8550

[INFO] InitializeStderr: DebugLog initialized.
[WARNING] Input[0] has Datatype 0x408.
[INFO] SNPE HTP Offline Prepare: Attempting to create cache for SM8550
[USER_INFO] Target device backend record identifier: HTP_V73_SM8550_8MB
[USER_INFO] No cache record in the DLC matches the target device (HTP_V73_SM8550_8MB). Creating a new record
[INFO] Attempting to open dynamically linked lib: libHtpPrepare.so
[INFO] dlopen libHtpPrepare.so SUCCESS handle 0x126c300
[INFO] Found Interface Provider (v2.13)
[USER_WARNING] QnnDsp <W> Initializing HtpProvider
[USER_WARNING] QnnDsp <W> HTP arch will be deprecated, please set SoC id instead.
[USER_WARNING] QnnDsp <W> Performance Estimates unsupported
[USER_INFO] Platform option not set
[USER_INFO] Offline Prepare VTCM size(MB) selected = 8
[USER_INFO] Offline Prepare Optimization Level passed = 2
[USER_WARNING] QnnDsp <W> Output padding param cannot be set explicitly. Skipping param
[USER_WARNING] QnnDsp <W> Output padding param cannot be set explicitly. Skipping param
[USER_INFO] Backend Mgr ~Dtor called for backend HTP
[USER_INFO] Cleaning up Context handle:0x1
[USER_INFO] BackendTerminate triggered
[INFO] SNPE HTP Offline Prepare: Successfully created cache for SM8550
[INFO] ======== Run Summary ========
[INFO]   SM8550 :  Success
[USER_INFO] BackendTerminate triggered
[INFO] DebugLog shutting down.

CPU image GPU image DSP image

quic-vraidu commented 4 months ago

Did you make the output names modification in android app file i.e inference.cpp file and executeDLC module.

hieunm1821 commented 4 months ago

Yes. I thought that is necessary for CPU and GPU to work.

quic-vraidu commented 4 months ago

Can you share snippet of modification in inference.cpp file.

hieunm1821 commented 4 months ago

I changed name_out_boxes and name_out_classes and also 1 class instead of 80 classes

bool executeDLC(cv::Mat &img, int orig_width, int orig_height, int &numberofobj, std::vector<std::vector<float>> &BB_coords, std::vector<std::string> &BB_names) {

    LOGI("execute_net_BB");
    ATrace_beginSection("preprocessing");

    struct timeval start_time, end_time;
    float milli_time, seconds, useconds;

    mtx.lock();
    assert(snpe_BB!=nullptr);

    if(!loadInputUserBuffer_BB(applicationInputBuffers, snpe_BB, img, inputMap, bitWidth))
    {
        LOGE("Failed to load Input UserBuffer");
        mtx.unlock();
        return false;
    }

    std::string name_out_boxes = "893";
    std::string name_out_classes =  "885";

    ATrace_endSection();
    gettimeofday(&start_time, NULL);
    ATrace_beginSection("inference time");

    bool execStatus = snpe_BB->execute(inputMap, outputMap);
    ATrace_endSection();
    ATrace_beginSection("postprocessing time");
    gettimeofday(&end_time, NULL);
    seconds = end_time.tv_sec - start_time.tv_sec; //seconds
    useconds = end_time.tv_usec - start_time.tv_usec; //milliseconds
    milli_time = ((seconds) * 1000 + useconds/1000.0);
    //LOGI("Inference time %f ms", milli_time);

    if(execStatus== true){
        LOGI("Exec BB status is true");
    }
    else{
        LOGE("Exec BB status is false");
        mtx.unlock();
        return false;
    }

    std::vector<float32_t> BBout_boxcoords = applicationOutputBuffers.at(name_out_boxes);
    std::vector<float32_t> BBout_class = applicationOutputBuffers.at(name_out_classes);

    std::vector<BoxCornerEncoding> Boxlist;
    std::vector<std::string> Classlist;

    //Post Processing
    for(int i =0;i<(2100);i++)  //TODO change value of 2100 to soft value
    {
        int start = i*1;
        int end = (i+1)*1;

        auto it = max_element (BBout_class.begin()+start, BBout_class.begin()+end);
        int index = distance(BBout_class.begin()+start, it);

        std::string classname = classnamemapping[index];
        if(*it>=0.3 )
        {
            int x1 = BBout_boxcoords[i * 4 + 0];
            int y1 = BBout_boxcoords[i * 4 + 1];
            int x2 = BBout_boxcoords[i * 4 + 2];
            int y2 = BBout_boxcoords[i * 4 + 3];
            Boxlist.push_back(BoxCornerEncoding(x1, y1, x2, y2,*it,classname));
        }
    }

    //LOGI("Boxlist size:: %d",Boxlist.size());
    std::vector<BoxCornerEncoding> reslist = NonMaxSuppression(Boxlist,0.20);
    //LOGI("reslist ssize %d", reslist.size());

    numberofobj = reslist.size();
    float ratio_2 = orig_width/320.0f;
    float ratio_1 = orig_height/320.0f;
    //LOGI("ratio1 %f :: ratio_2 %f",ratio_1,ratio_2);

    for(int k=0;k<numberofobj;k++) {
        float top,bottom,left,right;
        left = reslist[k].y1 * ratio_1;   //y1
        right = reslist[k].y2 * ratio_1;  //y2

        bottom = reslist[k].x1 * ratio_2;  //x1
        top = reslist[k].x2 * ratio_2;   //x2

        //LOGI("Coords:: x1:%d :: y1:%d :: x2:%d :: y2:%d",reslist[0].x1,reslist[0].y1,reslist[0].x2,reslist[0].y2);
        //LOGI("after mul: %f %f %f %f",bottom, left, top, right );

        std::vector<float> singleboxcoords{top, bottom, left, right, milli_time};
        BB_coords.push_back(singleboxcoords);
        BB_names.push_back(reslist[k].objlabel);
    }

    ATrace_endSection();
    mtx.unlock();
    return true;
}
quic-vraidu commented 4 months ago
std::string name_out_boxes = "885"
std::string name_out_classes =  "893"

Can you try with above combination and update.

hieunm1821 commented 4 months ago

Not matched. I think java_vm_ext.cc:591] JNI DETECTED ERROR IN APPLICATION: JNI NewStringUTF called with pending exception java.lang.ArrayIndexOutOfBoundsException: length=100; index=100

image Based on this, I think 893 is for boxes and 885 is for classes

quic-vraidu commented 4 months ago

Other possibility on this issue seems to be related to precision on DSP. You can refer mixed-precision logic and enhance the model.

quic-rneti commented 2 months ago

Hi @hieunm1821 - Is this issue fixed, Pls let us know. We will prioritize support if this is not fixed yet.

quic-rneti commented 2 months ago

Hello @hieunm1821 - Is this issue fixed, Pls let us know. We will prioritize support if this is not fixed yet.

hieunm1821 commented 2 months ago

Hi. Thanks for the following up. I think it fixed. The reason is because loss of accuracy when quantizing

quic-rneti commented 2 months ago

Thanks for the update, Good to see developers using our applications to deploy custom trained models.

hieunm1821 commented 1 month ago

I found a quick way to fix is to remove these flag when use snpe-dlc-quantize

--use_enhanced_quantizer --use_adjusted_weights_quantizer --axis_quant