marcoslucianops / DeepStream-Yolo

NVIDIA DeepStream SDK 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models
MIT License
1.45k stars 357 forks source link

Not detect on custom data yolo-nas #364

Closed arielkantorovich closed 1 year ago

arielkantorovich commented 1 year ago

setup:

Hi, thank you for the great Git, I run Yolo-nas on coco weights, and everything works fine. When I convert the file.pth to file.onnx I do not use the flag --dynamic. Now I want to use my training network the model again is yolo nas-small version is name is Gazbo_Nas.pth I success to convert to Gazbo_Nas.onnx and I get Gazbo_Nas.onnx_b1_gpu0_fp32.engine but when I run the pipeline I detect nothing. I think my problem is some configuration problem that I do not understand. Of course, before I go to deep-stream I check my results and see that it works without deepstream using supergradient libraries. I will be happy for any advice thank you.

I attached the configuration files: deepstream_app_config.txt labels2.txt config_infer_primary_yolonas.txt

marcoslucianops commented 1 year ago

Try again with new updated files I just uploaded to the repo. Export the ONNX model with the new export file, generate the TensorRT engine again with the updated files, and use the new config_infer_primary_yolonas.txt file.

kaarelkivistik commented 1 year ago

Try again with new updated files I just uploaded to the repo. Export the ONNX model with the new export file, generate the TensorRT engine again with the updated files, and use the new config_infer_primary_yolonas.txt file.

I have the same issue. If I lower score and IoU thresholds to 0.1 I do see some weird static bounding boxes. It didn't improve with the latest update (re-exported ONNX file and generated TensorRT engine. YOLO-NAS COCO started working properly but custom model didn't).

By the way, awesome work! I really like the ONNX approach you've chosen.

arielkantorovich commented 1 year ago

Hi, Unfortunately, It didn't solve this problem. I try to convert the model using super-gradient libary:

# Load model with pretrained weights from super_gradients.training import models from super_gradients.common.object_names import Models model = models.get(model_name=yolo_nas_s, num_classes=5, checkpoint_path=/path) model.eval() model.prep_model_for_conversion(input_size=[1, 3, 640, 640]) torch.onnx.export(model, dummy_input, "yolo_nas_m.onnx")

I succeed to get onnx file but when I try convert to tensot RT I get:

`Using winsys: x11 WARNING: Deserialize engine failed because file path: /opt/nvidia/deepstream/deepstream-6.1/sources/deepstream_python_apps/apps/DeepStream-Yolo/gazbo_model_b1_gpu0_fp32.engine open error 0:00:05.311159880 4453 0x162b150 WARN nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() [UID = 1]: deserialize engine from file :/opt/nvidia/deepstream/deepstream-6.1/sources/deepstream_python_apps/apps/DeepStream-Yolo/gazbo_model_b1_gpu0_fp32.engine failed 0:00:05.372055404 4453 0x162b150 WARN nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() [UID = 1]: deserialize backend context from engine from file :/opt/nvidia/deepstream/deepstream-6.1/sources/deepstream_python_apps/apps/DeepStream-Yolo/gazbo_model_b1_gpu0_fp32.engine failed, try rebuild 0:00:05.372172397 4453 0x162b150 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() [UID = 1]: Trying to create engine from model files WARNING: [TRT]: onnx2trt_utils.cpp:363: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. ERROR: [TRT]: ModelImporter.cpp:748: While parsing node number 501 [NonZero -> "onnx::Transpose_1564"]: ERROR: [TRT]: ModelImporter.cpp:749: --- Begin node --- ERROR: [TRT]: ModelImporter.cpp:750: input: "onnx::NonZero_1563" output: "onnx::Transpose_1564" name: "NonZero_501" op_type: "NonZero"

ERROR: [TRT]: ModelImporter.cpp:751: --- End node --- ERROR: [TRT]: ModelImporter.cpp:753: ERROR: builtin_op_importers.cpp:4941 In function importFallbackPluginImporter: [8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

Could not parse the ONNX model

Failed to build CUDA engine ERROR: Failed to create network using custom network creation function ERROR: Failed to get cuda engine from custom library API 0:00:08.867142524 4453 0x162b150 ERROR nvinfer gstnvinfer.cpp:640:gst_nvinfer_logger: NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() [UID = 1]: build engine file failed 0:00:08.926442253 4453 0x162b150 ERROR nvinfer gstnvinfer.cpp:640:gst_nvinfer_logger: NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() [UID = 1]: build backend context failed 0:00:08.926687824 4453 0x162b150 ERROR nvinfer gstnvinfer.cpp:640:gst_nvinfer_logger: NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() [UID = 1]: generate backend failed, check config file settings 0:00:08.926864370 4453 0x162b150 WARN nvinfer gstnvinfer.cpp:846:gst_nvinfer_start: error: Failed to create NvDsInferContext instance 0:00:08.926915538 4453 0x162b150 WARN nvinfer gstnvinfer.cpp:846:gst_nvinfer_start: error: Config file path: /opt/nvidia/deepstream/deepstream-6.1/sources/deepstream_python_apps/apps/DeepStream-Yolo/config_infer_primary_yolonas.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED ** ERROR: : Failed to set pipeline to PAUSED Quitting ERROR from primary_gie: Failed to create NvDsInferContext instance Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(846): gst_nvinfer_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie: Config file path: /opt/nvidia/deepstream/deepstream-6.1/sources/deepstream_python_apps/apps/DeepStream-Yolo/config_infer_primary_yolonas.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED App run failed `

marcoslucianops commented 1 year ago

You need to use the export_yolonas.py from this repo. Did you change some parameter in the image preprocessing on your trained model?

arielkantorovich commented 1 year ago

I try something else because I think there is a problem with export_yolonas.py, when I use exoort_yolo_nas.txt I get the same results that I specify in my first comment, my trained model parameters: yolo_nas_small, 640X640 image size and a number of classes=5. Ofcourse I specify this parameter in config_infer_primary_yolonas.txt

marcoslucianops commented 1 year ago

I meant image normalization, or other preprocessing. The TensorRT drops a bit the accuracy of the model, but in my experience, I still getting the output of my models.

arielkantorovich commented 1 year ago

I don’t do something special, just transfer learning to my data from coco exactly like the notebook of super gradient but I pay attention to something when I not use smplify flag I get this Error:

` python3 export_yolonas.py -m yolo_nas_s -w Gazbo_Nas.pth The console stream is logged into /home/uvision/sg_logs/console.log [2023-06-08 03:33:23] INFO - crash_tips_setup.py - Crash tips is enabled. You can set your environment variable to CRASH_HANDLER=FALSE to disable it /home/uvision/Downloads/yolo-nas-venv/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: warn(f"Failed to load image Python extension: {e}") [2023-06-08 03:33:35] WARNING - init.py - Failed to import pytorch_quantization [2023-06-08 03:33:35] WARNING - calibrator.py - Failed to import pytorch_quantization [2023-06-08 03:33:35] WARNING - export.py - Failed to import pytorch_quantization [2023-06-08 03:33:35] WARNING - selective_quantization_utils.py - Failed to import pytorch_quantization

Starting: Gazbo_Nas.pth Opening YOLO-NAS model

Traceback (most recent call last): File "/home/uvision/Downloads/yolo-nas-venv/lib/python3.8/site-packages/super_gradients/training/utils/checkpoint_utils.py", line 58, in adaptive_load_state_dict net.load_state_dict(state_dict, strict=strict_bool) File "/home/uvision/Downloads/yolo-nas-venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for YoloNAS_S: Missing key(s) in state_dict: "backbone.stem.conv.branch_3x3.conv.weight", "backbone.stem.conv.branch_3x3.bn.weight", "backbone.stem.conv.branch_3x3.bn.bias", "backbone.stem.conv.branch_3x3.bn.running_mean", "backbone.stem.conv.branch_3x3.bn.running_var", "backbone.stem.conv.branch_1x1.weight", "backbone.stem.conv.branch_1x1.bias", "backbone.stem.conv.post_bn.weight", "backbone.stem.conv.post_bn.bias", "backbone.stem.conv.post_bn.running_mean", "backbone.stem.conv.post_bn.running_var", ght", param.weight", "backbone.stage2.blocks.bottlenecks.0.cv2.rbr_reparam.bias", "backbone.stage2.blocks.bottlenecks.1.alpha", "backbone.stage2.blocks.bottlenecks.1.cv1.branch_3x3.conv.weight", "backbone.stage2.blocks.bottlenecks.1.cv1.branch_3x3.bn.weight", ....... ....... eads.head1.cls_convs.0.seq.conv.weight", "module.heads.head1.cls_convs.0.seq.bn.weight", "module.heads.head1.cls_convs.0.seq.bn.bias", "module.heads.head3.cls_convs.0.seq.bn.running_var", "module.heads.head3.cls_convs.0.seq.bn.num_batches_tracked", "module.heads.head3.reg_convs.0.seq.conv.weight", "module.heads.head3.reg_convs.0.seq.bn.weight", "module.heads.head3.reg_convs.0.seq.bn.bias", "module.heads.head3.reg_convs.0.seq.bn.running_mean", "module.heads.head3.reg_convs.0.seq.bn.running_var", "module.heads.head3.reg_convs.0.seq.bn.num_batches_tracked", "module.heads.head3.cls_pred.weight", "module.heads.head3.cls_pred.bias", "module.heads.head3.reg_pred.weight", "module.heads.head3.reg_pred.bias".

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "export_yolonas.py", line 104, in sys.exit(main(args)) File "export_yolonas.py", line 43, in main model = yolonas_export(args.model, args.weights, args.classes, args.size) File "export_yolonas.py", line 29, in yolonas_export model = models.get(model_name, num_classes=num_classes, checkpoint_path=weights) File "/home/uvision/Downloads/yolo-nas-venv/lib/python3.8/site-packages/super_gradients/training/models/modelfactory.py", line 208, in get = load_checkpoint_to_model( File "/home/uvision/Downloads/yolo-nas-venv/lib/python3.8/site-packages/super_gradients/training/utils/checkpoint_utils.py", line 229, in load_checkpoint_to_model adaptive_load_state_dict(net, checkpoint, strict) File "/home/uvision/Downloads/yolo-nas-venv/lib/python3.8/site-packages/super_gradients/training/utils/checkpoint_utils.py", line 61, in adaptive_load_state_dict adapted_state_dict = adapt_state_dict_to_fit_model_layer_names(net.state_dict(), state_dict, solver=solver) File "/home/uvision/Downloads/yolo-nas-venv/lib/python3.8/site-packages/super_gradients/training/utils/checkpoint_utils.py", line 159, in adapt_state_dict_to_fit_model_layer_names raise ValueError(f"ckpt layer {ckpt_key} with shape {ckpt_val.shape} does not match {model_key}" f" with shape {model_val.shape} in the model") ValueError: ckpt layer module.heads.head1.cls_pred.weight with shape torch.Size([5, 64, 1, 1]) does not match heads.head1.cls_pred.weight with shape torch.Size([80, 64, 1, 1]) in the model`

It looks like onnx expected in the head layer to be 80 classes (like coco) but in my model, I have only 5 classes maybe this is the problem? when I use export_yoloV5 everything works fine (ofcurse on yoloV5 weights).

arielkantorovich commented 1 year ago

Try again with new updated files I just uploaded to the repo. Export the ONNX model with the new export file, generate the TensorRT engine again with the updated files, and use the new config_infer_primary_yolonas.txt file.

I have the same issue. If I lower score and IoU thresholds to 0.1 I do see some weird static bounding boxes. It didn't improve with the latest update (re-exported ONNX file and generated TensorRT engine. YOLO-NAS COCO started working properly but custom model didn't).

By the way, awesome work! I really like the ONNX approach you've chosen.

I will be happy to know if you solve this problem in your case ?

marcoslucianops commented 1 year ago

The command for custom YOLO-NAS model is

python3 export_yolonas.py -m yolo_nas_s -w Gazbo_Nas.pth -n 5

The -n or --classes is the number of classes in your model.

storm12t48 commented 1 year ago

Maybe off topic but have you found a way to install supergradient on a jetson? knowing that onnx-simplifier simply does not want to install even with a cmake with the source

arielkantorovich commented 1 year ago

@storm12t48 Hi, I first install cmake using pip install cmake, and then I install onnx-simplifier on my case work.

arielkantorovich commented 1 year ago

@marcoslucianops are you find some solution or explanation for why yolo-nas doesn't detect objects? When I run yoloV5 everything work fine.

storm12t48 commented 1 year ago

@storm12t48 Salut, j'installe d'abord cmake en utilisant pip install cmake, puis j'installe onnx-simplifier sur mon travail de cas.

But it installs me onnxsim something that I know how to do without going through cmake onnxsim=onnx-simplifier?

marcoslucianops commented 1 year ago

@arielkantorovich Can you send the model to my email for testing?

arielkantorovich commented 1 year ago

@marcoslucianops sure, I will send you the weights pt, and short video that you will can check the results please send me your email, and again thank you for help me.

marcoslucianops commented 1 year ago

I didn't receive the model but I fixed the issue.

I added the config_infer_primary_yolonas_custom.txt file to the repo. For custom YOLO-NAS model you should use it.

The pretrained YOLO-NAS is using net-scale-factor=0.0039215697906911373 and the custom YOLO-NAS is trained with net-scale-factor=1.