renesas-rz / rzv_drp-ai_tvm

Extension package of Apache TVM (Machine Learning Compiler) for Renesas DRP-AI accelerators powered by Edgecortix MERA(TM) Based Apache TVM version: v0.11.1
Apache License 2.0
41 stars 5 forks source link

Question about implementing yolov5 #3

Closed 18520506 closed 1 year ago

18520506 commented 1 year ago

I followed this link: https://renesas.info/wiki/YoloV5_TVM_Implementation_Guide to implement Yolov5L. When I exported the onnx, I get the structure of the onnx like image below:

Screenshot from 2022-12-23 17-42-30

But after I translated the onnx for drp-ai. I got the structure of the onnx after translated like below: Screenshot from 2022-12-23 17-42-27

It looks like that the onnx's input is different with the yolov5's onnx. Why does the onnx's input change after being translated? Is it a bug ?

simobepu commented 1 year ago

Thank you for your question. It's not a bug. The translated ONNX is the part of the original ONNX and it is the part that can be processed by DRP-AI. The part from the input (images) to the first Mul in the original ONNX is processed by the CPU. This part is not output as ONNX. image

18520506 commented 1 year ago

Thank you very much for your answer.

18520506 commented 1 year ago

I have a question, @simobepu . What is the right DRP-AI input image information? I have done some experiments but I still didn't know which is right. Experiment 1: I have tried to define value as followed:

    constexpr static int32_t  TVM_MODEL_IN_W = (640);
    constexpr static int32_t  TVM_MODEL_IN_H = (640);
    constexpr static int32_t  TVM_MODEL_IN_C = (3); 
    num_grids = {80,40,20}
    constexpr static int32_t YOLOV5_NUM_BB= 3;
    constexpr static int32_t YOLOV5_NUM_INF_OUT_LAYER=3;

Note: I used the bounding box and class prediction as YOLOV3

If I defined the value right, why does the result look like below: Screenshot from 2023-01-04 17-15-42 Experiment 2: I have tried to do as followed:

    constexpr static int32_t  TVM_MODEL_IN_W = (320);
    constexpr static int32_t  TVM_MODEL_IN_H = (320);
    constexpr static int32_t  TVM_MODEL_IN_C = (32); 
    num_grids = {80,40,20}
    constexpr static int32_t YOLOV5_NUM_BB= 3;
    constexpr static int32_t YOLOV5_NUM_INF_OUT_LAYER=3;

I used the bounding box and class prediction as YOLOV3 The result is that it did not detect anything. I checked the calculation and found that tx,ty,tw,th,tc sometime has nan value.

If I defined the value right, then does the error happen because I did not change inf_pre_process function to take the input(images) to the first Mul as the previous answer? Thank you for reading

wk-mnA commented 1 year ago

Thank you for posting the issue.

All values you have tried to define depends on each YOLO model. The model defines the input size, output layer details (such as num_grids, _NUM_BB, _NUM_INF_OUT_LAYER) and anchor values.

Check your YOLO config file and model architecture to find the appropriate value for your model. Also, YOLOv5 CPU post-processing(processing for tx,ty, etc.) may differ from YOLOv3. Check your training environment and its post-processing match your application source code.