ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.84k stars 16.37k forks source link

How to modify Detect layer to allow for converting yolov5 to Qualcomm's SNPE format? #4790

Closed evdoks closed 3 years ago

evdoks commented 3 years ago

❔Question

I am trying to convert a trained yolov5s model to an SNPE format in order to be able to run it on a Snapdragon chip. Unfortunately, Qualcomm's ONNX to SNPE converter fails on the Detect level with the following error message

ValueError: Unable to permute shape [1, 3, 64, 64, 2] to NSC ordering
2021-09-14 15:15:37,327 - 183 - ERROR - Node Mul_268: Unable to permute shape [1, 3, 64, 64, 2] to NSC ordering

I can imagine, it may have something to do with the fact that SNPE currently supports 4D input data, where the first dimension is batch SNPE doc and yolov5 Detect layer has 5D reshape.

Would it be possible to modify Detect layer so that no 5D reshape is performed?

github-actions[bot] commented 3 years ago

👋 Hello @evdoks, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 3 years ago

@evdoks I haven't used the SNPE converter myself, so I can't help directly, but I do see Qualcomm compatibility with YOLOv5 officially mentioned here in the Snapdragon Neural Engine SDK release notes from March 2021: https://developer.qualcomm.com/sites/default/files/docs/snpe/revision_history.html

Screen Shot 2021-09-14 at 6 42 58 PM
evdoks commented 3 years ago

@glenn-jocher Thanks for the link, I saw this and this is why I was hoping that the conversion should work.

However, I was not able to find anyone who could successfully do it on a trained yolo model and there are questions on Qualcomm's dev forum from people hitting the same wall:

Screenshot 2021-09-15 at 09 58 17

The conversion works, if one removes Detect layer (using --train flag in your export.py script), but then the mdel is not of much use.

glenn-jocher commented 3 years ago

@evdoks I think you are not understanding --train. All models export with all layers, there are no circumstances when export omits the Detect layer.

evdoks commented 3 years ago

@glenn-jocher, you a right, I have expressed it incorrectly, but my understanding is that when using --train flag, the exported onnx model is in training mode and I am not quite sure how can I use it for making inferences. At least in my case, the onnx model stops doing predictions if exported with --train flag, which is not the case if no training mode is set.

glenn-jocher commented 3 years ago

@evdoks yes in --train mode the grid for inference output is not constructed (as it is not needed for loss computation), so there's something isolated in that area that is causing the issue. The 5D reshape is still present in --train mode though on L55, so it's probably not the source of the problem. You might try turning self.inplace on or off to see if it has an effect.

https://github.com/ultralytics/yolov5/blob/b74dd4ba4f295eaacc8cc3ac75270ba40a2d9ef6/models/yolo.py#L50-L71

evdoks commented 3 years ago

@glenn-jocher thanks for looking into it, but it didn't help. Neither exporting the model to onnx with --inline nor training the .pt model with inline turned on and off in the YAML file and exporting it to onnx afterward.

Qualcomm's dev forum seems to be a dead place - some people have already posted questions there regarding yolov5 compatibility but got no response.

github-actions[bot] commented 3 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

jayer95 commented 2 years ago

@evdoks Hi, I am also paying attention to this issue Do you have any progress? About YOLOv5 on SNPE

evdoks commented 2 years ago

@jayer95 unfortunately not. Switched to ResNet (which totally sucks). Let us know here if you get any breakthroughs. Qualcomm keeps updating the converter, but I haven't noticed anything that could be relevant to the issue with YOLO in release notes of the latest versions.

jayer95 commented 2 years ago

@evdoks Thank you for your reply. I am currently experimenting closely. I have successfully converted YOLOv5.dlc, but there is currently no way to verify whether this model is available.

I don't quite understand the "--train" parameter proposed by the author of YOLOv5 to shield the 5D network model layer.

@glenn-jocher Can I ask your opinion?

glenn-jocher commented 2 years ago

@evdoks Detect() does not have a self.train parameter. It has a self.training parameter returns the grids when training, or the sigmoid predictions during inference. https://github.com/ultralytics/yolov5/blob/562191f5756273aca54225903f5933f7683daade/models/yolo.py#L50-L71

jayer95 commented 2 years ago

@glenn-jocher Can you help convert the DLC of SNPE? You must know better than us!!!

glenn-jocher commented 2 years ago

@jayer95 sorry, I don't actually know what DLC is. We have a mobile developer who's working on our upcoming HUB app, but for Android there we are using established TFLite export workflows and not yet targeting specific backends. What's the benefit of going this export route? Does it provide better access to NNAPI or Hexagon delegates?

If the main issue is simply the 5D nature of the tensors in Detect there's certainly a workaround you could do to handle the reshaping/permutation ops differently. You'd want to create an x-y fused grid (1d rather than 2d), and then of course also create the offsets/gains you need in 1d rather than 2d, then your tensor would be 4d (batch, anchors, xy, outputs)

hansoullee20 commented 2 years ago

@evdoks @jayer95 It is possible to convert yolov5 to .dlc format. You'd need to use version 3.1 yolov5s and specify the output nodes to the convolution layer output before the 5D reshape. Check out the SNPE release notes page 20. The exact texts are as below: • Export the pre trained YOLO v3/v5 to ONNX

  1. Follow the official export tutorial to obtain the ONNX model: • https://docs.ultralytics.com/yolov5/tutorials/model_export 2.Simplify the exported ONNX model by onnx simplifier •https://github.com/daquexian/onnx simplifier 3.Conversion: specify output nodes before 5D Reshape •Example for YOLOv5 snpe onnx to dlc i yolov5s.onnx out_node 742 out_nodec 762 out_node 782 •Example for YOLOv3 snpe onnx to dlc i yolov3.onnx out_node 332 out_node 352 out_node 372 •Need to handle 5D Ops and postprocessing outside the model

I came as far as getting the 4D output in NativeCpp settings but made zero progress on extracting inferences. Has anyone made any progress?

wwxzxd commented 2 years ago

@evdoks 感谢您的回复。我目前正在密切试验。我已经成功转换了YOLOv5.dlc,但是目前没有办法验证这个模型是否可用。

不太明白YOLOv5作者提出的屏蔽5D网络模型层的“--train”参数。

@glenn-jocher 我可以问问你的意见吗?

Hello! It's convenient to ask how you can convert the pt model file of yolov5 into dlc. Thank you very much

glenn-jocher commented 2 years ago

@wwxzxd sorry what is dlc?

jayer95 commented 2 years ago

@glenn-jocher please refer: https://developer.qualcomm.com/sites/default/files/docs/snpe/overview.html

glenn-jocher commented 2 years ago

@jayer95 got it, thanks!

@wwxzxd @jayer95 @evdoks The main step we could take here would be to try to add official export support for Snapdragon dlc export to export.py. We currently support 10 different model formats, and there is a system in place for export and inference of each. From TFLite, ONNX, CoreML, TensorRT Export #251:

Formats

YOLOv5 export is supported for the following formats

Format Example --include ... argument
PyTorch yolov5s.pt -
TorchScript yolov5s.torchscript torchscript
ONNX yolov5s.onnx onnx
CoreML yolov5s.mlmodel coreml
OpenVINO yolov5s_openvino_model/ openvino
TensorFlow SavedModel yolov5s_saved_model/ saved_model
TensorFlow GraphDef yolov5s.pb pb
TensorFlow Lite yolov5s.tflite tflite
TensorFlow.js yolov5s_web_model/ tfjs
TensorRT yolov5s.engine engine

The fastest and easiest way to incorporate your ideas into the official codebase is to submit a Pull Request (PR) implementing your idea, and if applicable providing before and after profiling/inference/training results to help us understand the improvement your feature provides. This allows us to directly see the changes in the code and to understand how they affect workflows and performance.

Please see our ✅ Contributing Guide to get started. Thank you!

hansoullee20 commented 2 years ago

@glenn-jocher thank you for your reply. I'm somewhat relieved to know Im not alone in this search. The models are converted to .dlc format via snpe tools. (https://developer.qualcomm.com/sites/default/files/docs/snpe/tools.html) So far snpe supports the conversion for 6 models (tensorflow, tflite, onnx, pytorch, caffe, and caffe2).

I've tried to convert the yolov5.pb model by exporting in onnx and tensorflow models. The issue rises when the converters reach the following line in yolo.py: x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

It seems as snpe has problem converting the permute function. Could this line get rewritten without using the permute function? I think as long as we pass this part, we will have the dlc model.

zhiqwang commented 2 years ago

Could this line get rewritten without using the permute function? I think as long as we pass this part, we will have the dlc model.

FYI @hansoullee20, I guess one workaround for this is just to remove this line in the Detect module when exporting the ONNX model for SNPE backend (set --train as True also), and do this function in the SNPE parts.

https://github.com/ultralytics/yolov5/blob/db6ec66a602a0b64a7db1711acd064eda5daf2b3/models/yolo.py#L53-L54

hansoullee20 commented 2 years ago

@zhiqwang but as far as I understand, the --train option in export removes the detect layer right? Wouldn't that be pointless since the detect layer will then need to be processed outside of dlc?

zhiqwang commented 2 years ago

but as far as I understand, the --train option in export removes the detect layer right? Wouldn't that be pointless since the detect layer will then need to be processed outside of dlc?

Seems that it will remove the https://github.com/ultralytics/yolov5/blob/db6ec66a602a0b64a7db1711acd064eda5daf2b3/models/yolo.py#L56-L68 only and return a list containing 3 intermediate head if you set --train like following

python export.py --weights path/to/your/model.pt --include onnx --simplify --train

And if SNPE doesn't support the permute op, they will have a low probability of supporting torch.meshgrid used in the _make_grid in the above line 58.

hansoullee20 commented 2 years ago

So, good news! Seems like yolov5 is now compatible with SNPE! Pull from the master branch, export to onnx, and convert to dlc without specifying out_node. Would appreciate any inputs on how to proceed from here in SNPE :)

jayer95 commented 2 years ago

At present, yolov5 v6.0 version can convert snpe correctly.

onnx==1.6.0 onnx-simplifier==0.3.6 onnxoptimizer==0.2.6 onnxruntime==1.1.0 scikit-learn==0.19.2 numpy==1.19.5 protobuf==3.17.3 torch==1.10.0

git clone https://github.com/ultralytics/yolov5.git cd yolov5 git checkout v6.0

python export.py --weights yolov5n.pt --optimize --opset 11 --simplify

Please use Netron to view the exported yolov5n.onnx, you will find that the layer above the 5D output nodes is the 4D output nodes: Conv_198, Conv_232, Conv_266, then the output nodes are: 326, 379, 432, so we need to specify these 3 output nodes when converting yolov5n.dlc.

But at present, a program is still needed to demo the converted yolov5n.dlc. The most important thing is that the inference program must contain the "letterbox" preprocessing algorithm of yolov5 to ensure that "letterbox" is used in training and also in inference.

zhiqwang commented 2 years ago

Hi @jayer95 ,

Please use Netron to view the exported yolov5n.onnx, you will find that the previous layer of reshape into 5D output is: Conv_198, Conv_232, Conv_266, then the output nodes are: 326, 379, 432, so we need to specify these 3 output nodes when converting yolov5n.dlc.

I have a question here, seems that the anchor decoder part in Detect at below will not make sense if we specify the output nodes to 326, 379, 432 ?

https://github.com/ultralytics/yolov5/blob/6865d19a92d8c160c7fc3c92256627dadce1cd1e/models/yolo.py#L57-L68

jayer95 commented 2 years ago

Hi @zhiqwang ,

Is the reason why you converted to yolov5n.dlc because you want to load yolov5.dlc on the SNPE SDK and output the post-processing result?

The correct conversion steps should be as follows: yolov5n.pt --> yolov5n.onnx --> yolov5n.dlc

When we converted yolov5n.onnx to yolov5n.dlc, we specified 3 output nodes: 326, 379, 432 (Conv_198, Conv_232, Conv_266), as shown below,

SNPE_4D_Output_Nodes

For nodes with 3 outputs: Conv_198, Conv_232, Conv_266, for 4D outputs specified by SNPE, please refer to: https://developer.qualcomm.com/sites/default/files/docs/snpe//image_input.html

SNPE 4D image output format is: batch_size grid_size grid_size (3 (box_size + score_size + class_size))

batch_size=1, box_size=4, score_size=1, class_size=80

Conv_266 node: 1x20x20x255 (grid_size=20) Conv_232 node: 1x40x40x255 (grid_size=40) Conv_198 node: 1x80x80x255 (grid_size =80)

At this time, it has been converted to yolov5n.dlc. The post-processing program for parsing yolov5n.dlc should be developed in C++ on the SNPE SDK or QCS devices. It has nothing to do with the post-processing of "yolov5/models/yolo.py".

I'm using SNPE SDK 1.58 (the latest version at present), when converting yolov5n.dlc, I use "snpe-onnx-to-dlc" under the x86 architecture for model conversion, and use "snpe-dlc-info" to view the model architecture of yolov5n.dlc.

Hi @glenn-jocher , Let's discuss the yolov5 dlc model supported by Qualcomm SNPE.

Mohit-Ak commented 2 years ago

I was able to convert a custom-trained Yolov5s model to DLC and execute it. There was a very high loss of accuracy though. @jayer95 Would it be possible for you to share your trained models .pt, .onnx and .dlc if they are not proprietary?

I want to see if I am making a mistake in the conversion or post-processing data like NMS?

hansoullee20 commented 2 years ago

@Mohit-Ak how were you able to execute the converted .dlc and get the result? Would you mind to share your code or offer some guidelines?

jayer95 commented 2 years ago

@Mohit-Ak

No problem, I provide an example model here.

gnome-shell-screenshot-N9IIH1

In order to ensure that the image preprocessing on the training side and the inference side of qtimlesnpe are consistent, we also used the concept of letterbox in model conversion.

For example, when I specify an image size of 640 during training, he will use 640x640 as the basemap, and then add the letterbox algorithm to randomly add the training image to the 640x640 gray basemap.

python train.py \ --img 640 \ --batch 64 \ --epochs 300 \ --data data/license_plate_hagrid4k_boston.yaml \ --weights '' \ --cfg models/yolov5n-license_plate_hagrid4k_boston.yaml

When the training is completed and then converted to .onnx, the script I originally used was:

python export.py --weights yolov5n-license_plate_hagrid4k_boston_last_640x640.pt --optimize --opset 11 --simplify --imgsz 640

But you will find that the input size of this .onnx model has become 640x640, which is not as expected, although it may not have much impact, so we change it to:

python export.py --weights yolov5n-license_plate_hagrid4k_boston_last_640x640.pt --optimize --opset 11 --simplify --imgsz 360 640 or python export.py --weights yolov5n-license_plate_hagrid4k_boston_last_640x640.pt --optimize --opset 11 --simplify --imgsz 384 640

At this point we will get a "yolov5n-license_plate_hagrid4k_boston_last_640x640.onnx", I renamed it to "yolov5n-license_plate_hagrid4k_boston_last_384x640.onnx"

Then convert to .dlc: snpe-onnx-to-dlc -i yolov5n-license_plate_hagrid4k_boston_last_384x640.onnx --out_node 326 --out_node 379 --out_node 432 -d images 1,3,384,640

The converted .dlc can use "snpe-dlc-info" to view the network, snpe-dlc-info -i yolov5n-license_plate_hagrid4k_boston_last_384x640.dlc

We can observe that the input size of "yolov5n-license_plate_hagrid4k_boston_last_384x640.dlc" is 384x640, and know that the final output layer is: Conv_198, Conv_232, Conv_266

Then write a .config file to provide QCS equipment for object detection (qtimlesnpe):

org.codeaurora.mle.snpe input_format = 3 BlueMean = 128.0 GreenMean = 128.0 RedMean = 128.0 BlueSigma = 128.0 GreenSigma = 128.0 RedSigma = 128.0 UseNorm = true preprocess_type = 1 confidence_threshold = 0.5 nms_threshold = 0.5 batch_size = 1 num_threads = 1 max_detection_result = 5 output_layers = < "Conv_266", "Conv_232", "Conv_198" > runtime = 1 model = "/data/misc/camera/yolov5n-license_plate_hagrid4k_boston_last_384x640.dlc" labels = "/data/misc/camera/license_plate_labels.txt"

Although I do this on the converted model, the .dlc still has a serious problem of loss of precision, the bbox will appear but not stable, and I am still debugging.

Env: snpe-1.58.0.3160 Python 3.6.13 :: Anaconda, Inc. ONNX: 1.6.0 For other python pkg versions, please refer to "python-env.txt"

https://drive.google.com/file/d/17cnZ4yVRQg-qxVHPkUYCfx7zK-k68Jy1/view?usp=sharing

hansoullee20 commented 2 years ago

@jayer95 Thank you so much for the detailed response!! This is so awesome!! can I ask you what org.codeaurora.mle.snpe is? Is this a specific url for a code?

glenn-jocher commented 2 years ago

@jayer95 @hansoullee20 the important thing is that the image you preprocess and send to the model in deployment is not stretched, i.e. aspect ratio must remain 1:1 for the object in the image, a circle remains a circle. You can do this by for example exporting at 640x640 and then letterboxing the input, or by exporting at 384x640 and then passing it an image of the same dimension.

A tiny bit of stretching is ok, i.e. iDetection on iOS stretches slightly to shape a 16:9 video into 320x192 inference.

glenn-jocher commented 2 years ago

Also note default deployment thresholds are 0.25 conf, 0.45 nms/iou

jayer95 commented 2 years ago

@glenn-jocher

Thanks to the author for your explanation. I have studied letterbox and stretching for a long time. The results of the final experiment show that the preprocessing during inference must be very similar or the same as that during training, so that the best object detection results can be obtained. In the inference of the author's "detect.py", even if image size=640 is specified, it will still rezise 640x640 to 384x640 using the concept of letterbox (when the video is 9:16, it is generally 1080x1920), and the detection effect is very good. This is very similar to the preprocessing during training. Therefore, I use the .dlc model under the Qualcomm chip, and also specify its input size as (very close to the height and length of 9:16, and conform to a multiple of 32, which is 384:640), and then make inferences under the Qualcomm device. The algorithm also added the concept of letterbox with gray up and down to ensure that the picture is not stretched. Are my models converted and interpreted correctly?

jayer95 commented 2 years ago

@glenn-jocher

Thank you for reminding the default values ​​of conf and nms. Since the execution of yolov5.dlc under Qualcomm's software is unprecedented, the accuracy cannot be consistent with the yolov5 pytorch side (so far). I spent a lot of time developing the "Parsing Algorithm for yolov5" on Qualcomm devices and making sure that the concept of letterbox was added to both inference and training.

jayer95 commented 2 years ago

@hansoullee20

Hi,

This config file is a file that Qualcomm uses gst-launch to develop qtimlesnpe, which needs to be used for inference. Currently, the official only claims to support ssd-mobilenet-v1. If you want it to support yolov5, you must rewrite qtimlesnpe yourself and add concept of letterbox.

org.codeaurora.mle.snpe input_format = 3 BlueMean = 128.0 GreenMean = 128.0 RedMean = 128.0 BlueSigma = 128.0 GreenSigma = 128.0 RedSigma = 128.0 UseNorm = true preprocess_type = 1 confidence_threshold = 0.5 nms_threshold = 0.5 batch_size = 1 num_threads = 1 max_detection_result = 5 output_layers = < "Conv_266", "Conv_232", "Conv_198" > runtime = 1 model = "/data/misc/camera/yolov5n-license_plate_hagrid4k_boston_last_384x640.dlc" labels = "/data/misc/camera/license_plate_labels.txt"

Due to some non-disclosure agreements, I cannot provide the relevant source code.

If you need a sample code that can parse the yolov5.dlc model, you may need to ask @Mohit-Ak , if he can provide the source code!

glenn-jocher commented 2 years ago

@glenn-jocher

Thanks to the author for your explanation. I have studied letterbox and stretching for a long time. The results of the final experiment show that the preprocessing during inference must be very similar or the same as that during training, so that the best object detection results can be obtained. In the inference of the author's "detect.py", even if image size=640 is specified, it will still rezise 640x640 to 384x640 using the concept of letterbox (when the video is 9:16, it is generally 1080x1920), and the detection effect is very good. This is very similar to the preprocessing during training. Therefore, I use the .dlc model under the Qualcomm chip, and also specify its input size as (very close to the height and length of 9:16, and conform to a multiple of 32, which is 384:640), and then make inferences under the Qualcomm device. The algorithm also added the concept of letterbox with gray up and down to ensure that the picture is not stretched. Are my models converted and interpreted correctly?

Yes this all looks correct, and yes you are right the main concept is that deployment pipelines should handle pre and post processing similarly to training for best results.

BTW, I'm implementing new benchmarking which processes all officially supported output formats. Qualcomm is not among them, but the idea is that every export model is profiled for speed and tested for accuracy. Example results on CPU are:

https://github.com/ultralytics/yolov5/blob/updates/benchmarks/utils/benchmarks.py

benchmarks: weights=/usr/src/yolov5/yolov5s.pt, imgsz=640, batch_size=1, data=/usr/src/yolov5/data/coco128.yaml
...

Benchmarks complete (782.19s)
                   Weights  mAP@0.5:0.95  Inference time (ms)
0               yolov5s.pt      0.407554           102.777191
1      yolov5s.torchscript      0.402908           132.848348
2             yolov5s.onnx      0.402908            89.061137
3   yolov5s_openvino_model      0.402908            67.093970
4                   engine           NaN                  NaN
5                   coreml           NaN                  NaN
6      yolov5s_saved_model      0.402908           133.983964
7               yolov5s.pb      0.402908           101.405423
8      yolov5s-fp16.tflite      0.402851           502.289245
9                  edgetpu           NaN                  NaN
10                    tfjs           NaN                  NaN
jayer95 commented 2 years ago

@glenn-jocher

How to read .dlc with python is a big problem, because SNPE is currently too closed. Most of the people who use SNPE are running AI on Android phones, and Qualcomm's chips also support TFLite.

jayer95 commented 2 years ago

@glenn-jocher

If necessary, I can convert the author's official yolov5s.pt into .dlc for everyone to test.

Mohit-Ak commented 2 years ago

Hi, @jayer95

I modified the detect.py of YOLOv5 to skip the .pt prediction part and read the .raw file from DLC instead. In detect.py after the following lines:

# Inference
        visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
        pred = model(im, augment=augment, visualize=visualize)

I added these to parse the raw file and get the predictions:

        print("Original_prediction_shape - %s"%(str(pred.shape)))
        print('####################################')
        CUSTOM_NUMBER_OF_CLASSES=80  ##Custom number of classes in your Yolov5 network
        t3 = time_sync()
        dt[1] += t3 - t2
        #Overriding model predictions with predictions from raw file - Begin
        if raw_image_path!="":
            pred_from_non_dlc_model=pred #Savinf the prediction from pytorch model in a variable
            bbox_file = raw_image_path #Raw image path which is the output of snpe-net-run
            bbox_array = np.fromfile(bbox_file, dtype=np.float32)
            print("bbox_array shape - %s"%(str(bbox_array.shape)))
            # # Reshape numpy arrays to Torch tensors for processing

            bbox_array_2D = np.reshape(bbox_array, (-1,CUSTOM_NUMBER_OF_CLASSES))
            print("bbox_array_2D shape - %s"%(str(bbox_array_2D.shape)))
            bbox_array_3D = bbox_array_2D[newaxis, :, :]
            print("bbox_array_3D shape - %s"%(str(bbox_array_3D.shape)))
            bbox_torch = torch.tensor(bbox_array_3D, device=device)
            print("bbox_torch shape - %s"%(str(bbox_torch.shape)))
            print('####################################')
            pred=bbox_torch

Once you get this, I had to make some changes to "non_max_suppression" in general.py as my confidence scores are all very obscured.

jayer95 commented 2 years ago

@Mohit-Ak

Hi, are the steps and details of your model conversion exactly the same as mine? If different, can you share your conversion process for my reference? I'm currently working closely with Qualcomm's internal engineers responsible for SNPE.

Is the detect.py you use the latest v6.0 version of yolov5? Could you provide the link to detect.py, and mark the function you added with a comment, thanks. Hi, are the steps and details of your model conversion exactly the same as mine? If different, can you share your conversion process for my reference? I'm currently working closely with Qualcomm's internal engineers responsible for SNPE.

Is the detect.py you use the latest v6.0 version of yolov5? Could you provide the link to detect.py, and mark the function you added with a comment, thanks.

jayer95 commented 2 years ago

Hi all,

I convert yolov5n.pt --imgsz 320 to yolov5n.onnx --imgsz 192 320 (the concept of letterbox), and then use SNPE 1.58 to convert yolov5n.dlc --imgsz 192 320(the concept of letterbox).

I use "gst-launch-1.0 / qtimlesnpe " to parse yolov5n.dlc to demo, the detection effect is very good, close to lossless conversion.

image: gnome-shell-screenshot-LQ5XH1

video: https://drive.google.com/file/d/1-eEi8dkh_3mLxd3CEnRpqPFLJJ4G5FPH/view?usp=sharing

fwzdev1 commented 2 years ago

At present, yolov5 v6.0 version can convert snpe correctly.

onnx==1.6.0 onnx-simplifier==0.3.6 onnxoptimizer==0.2.6 onnxruntime==1.1.0 scikit-learn==0.19.2 numpy==1.19.5 protobuf==3.17.3 torch==1.10.0

git clone https://github.com/ultralytics/yolov5.git cd yolov5 git checkout v6.0

python export.py --weights yolov5n.pt --optimize --opset 11 --simplify

Please use Netron to view the exported yolov5n.onnx, you will find that the layer above the 5D output nodes is the 4D output nodes: Conv_198, Conv_232, Conv_266, then the output nodes are: 326, 379, 432, so we need to specify these 3 output nodes when converting yolov5n.dlc.

But at present, a program is still needed to demo the converted yolov5n.dlc. The most important thing is that the inference program must contain the "letterbox" preprocessing algorithm of yolov5 to ensure that "letterbox" is used in training and also in inference.

Hi, I followed the process you said, including git clone the current version. But snpe-onnx-to-dlc error is still there. Then I tried your onnx file, nothing changed (Didn't use --out-nodes). Do you have any idea? I used SNPE 1.55.0, is that updating of SNPE allows the smooth converting instead of yolov5?

Update: Yes, I should update SNPE newer than 1.58.00.

fwzdev1 commented 2 years ago

Keypoint: SNPE >= 1.58.00 works well for both yolov5 6.0 and 6.1

hansoullee20 commented 2 years ago

@fwzdev1 @jayer95 Hi all thank you for sharing your progress.

have you guys checked the output.raw dimension? I am not getting the expected output. when i read the output.raw file using following commands:

data1 = np.fromfile('/root/output/Result_0/output.raw',np.float32) data1 = data1.reshape( 1,10647,6)

leads to Exception has occurred: ValueError cannot reshape array of size 15970 into shape (1,10647,6) File "/root/onnx_dlc_compare/comp.py", line 17, in data1 = data1.reshape( 1,10647,6)

but when i run snpe-dlc-info, it clearly says

| 257 | Concat_301 | concatenation | 457 | 474 | 1x3x13x13x6 | G C | axis: 4 | | | | | 468 | | | | | | | | | 473 | | | | | | 258 | Reshape_302 | reshape | 474 | 481 | 1x507x6 | A D G C | | | 259 | Concat_303 | concatenation | 377 | output | 1x10647x6 | A D G C | axis: 1 | | | | | 429 | | | | | | | | | 481 | | | | |

Note: The supported runtimes column assumes a processor target of Snapdragon 835 (8998) Key : A:AIP D:DSP G:GPU C:CPU

The expected dimension is clearly 1x10647x6

have you also had this error?

jayer95 commented 2 years ago

@hansoullee20

Please note that the three ONNX nodes I captured when converting the DLC are all 4D output, which conforms to the 4D format of SNPE, please refer to: https://developer.qualcomm.com/sites/default/files/docs/snpe/image_input.html

Also, what program are you using to parse YOLOv5.dlc? Did you refer to my steps above when you converted the dlc?

If you do not strictly refer to the above conversion steps, please study it. My conversion is certified by Qualcomm internal staff.

Please note that the latest v6.1 of yolov5 defaults to --opset 12 when converting ONNX, but SNPE still only supports ONNX --opset 11 in the latest version 1.59.

@glenn-jocher In addition, I would like to ask the author, why is --opset defaulted to 13 in v6.0, and changed to 12 in v6.1?

jayer95 commented 2 years ago

@fwzdev1

Congratulations on your successful conversion, the rest is how to parse yolov5.dlc on SNPE SDK.

zhiqwang commented 2 years ago

Hi @jayer95

In addition, I would like to ask the author, why is --opset defaulted to 13 in v6.0, and changed to 12 in v6.1?

The main consideration is the compatibility for exporting OpenVINO, check out the following for more details https://github.com/ultralytics/yolov5/pull/6057#issuecomment-998902797 .

jayer95 commented 2 years ago

@zhiqwang OK, I have also used OpenVINO, but I am developing Qualcomm products recently, and it is currently better to support 11 in Qualcomm SNPE. https://developer.qualcomm.com/sites/default/files/docs/snpe/supported_onnx_ops.html

fwzdev1 commented 2 years ago

@fwzdev1

Congratulations on your successful conversion, the rest is how to parse yolov5.dlc on SNPE SDK.

Thank you for your sharing. @jayer95

I was stuck in using dsp/aip to run the network because of the unsupported 5-dimension reshape operation, and the speed in CPU (100+ms) is totally unacceptable. After asking Qualcomm stuff, it comes the bad news that there's no option but to change yolov5 network, especially the detect head part. Which is a little bit tricky.