wang-xinyu / tensorrtx

Implementation of popular deep learning networks with TensorRT network definition API
MIT License
6.87k stars 1.76k forks source link

yolov7: Assertion scale_1 failed on custom trained model #1235

Closed abhinavrawat27 closed 1 year ago

abhinavrawat27 commented 1 year ago

Env

About this repo

Your problem

I have trained yolov7 tiny model on a custom dataset. I have tested the model and it seems to be working perfectly fine. Below is the command I used to train the model

python train.py --workers 1 --batch-size 2 --epochs 200 --device 0 --img 640 --data data/custom_data.yaml --hyp data/hyp.scratch.custom.yaml --cfg cfg/training/yolov7-custom.yaml --name yolov7-custom --weights yolov7-tiny.pt

Now on Jetson Xavier I want to generate its .engine file. I have successfully converted .pt to .wts. I have updated the number of classes to 1 in config.h. I have created the build directory and also ran cmake .. command. While running make command, it showed 2 warnings:

/home/john/Documents/tensorrtx/yolov7/plugin/yololayer.h(58): warning: function "nvinfer1::IPluginV2Ext::configurePlugin(const nvinfer1::Dims *, int32_t, const nvinfer1::Dims *, int32_t, const nvinfer1::DataType *, const nvinfer1::DataType *, const __nv_bool *, const __nv_bool *, nvinfer1::PluginFormat, int32_t)" is hidden by "nvinfer1::YoloLayerPlugin::configurePlugin" -- virtual function override intended?

/home/john/Documents/tensorrtx/yolov7/plugin/yololayer.h(58): warning: function "nvinfer1::IPluginV2Ext::configurePlugin(const nvinfer1::Dims *, int32_t, const nvinfer1::Dims *, int32_t, const nvinfer1::DataType *, const nvinfer1::DataType *, const bool *, const bool *, nvinfer1::PluginFormat, int32_t)" is hidden by "nvinfer1::YoloLayerPlugin::configurePlugin" -- virtual function override intended?

But it ran fine with no other issues. Then finally after running the command sudo ./yolov7 -s best-tiny.wts best-tiny.engine t it showed below errors:

Loading weights: best-tiny.wts
[02/16/2023-17:14:03] [E] [TRT] 3: model.0.conv:kernel weights has count 1080 but 864 was expected
[02/16/2023-17:14:03] [E] [TRT] 4: model.0.conv: count of 1080 weights in kernel, but kernel dimensions (3,3) with 3 input channels, 32 output channels and 1 groups were specified. Expected Weights count is 3 * 3*3 * 32 / 1 = 864
[02/16/2023-17:14:03] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::28] Error Code 4: Internal Error (model.0.conv: number of kernel weights does not match tensor dimensions)
[02/16/2023-17:14:03] [E] [TRT] 3: model.0.conv:kernel weights has count 1080 but 864 was expected
[02/16/2023-17:14:03] [E] [TRT] 4: model.0.conv: count of 1080 weights in kernel, but kernel dimensions (3,3) with 3 input channels, 32 output channels and 1 groups were specified. Expected Weights count is 3 * 3*3 * 32 / 1 = 864
[02/16/2023-17:14:03] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::28] Error Code 4: Internal Error (model.0.conv: number of kernel weights does not match tensor dimensions)
[02/16/2023-17:14:03] [E] [TRT] 3: model.0.conv:kernel weights has count 1080 but 864 was expected
[02/16/2023-17:14:03] [E] [TRT] 4: model.0.conv: count of 1080 weights in kernel, but kernel dimensions (3,3) with 3 input channels, 32 output channels and 1 groups were specified. Expected Weights count is 3 * 3*3 * 32 / 1 = 864
[02/16/2023-17:14:03] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::28] Error Code 4: Internal Error (model.0.conv: number of kernel weights does not match tensor dimensions)
[02/16/2023-17:14:03] [E] [TRT] 3: model.0.conv:kernel weights has count 1080 but 864 was expected
[02/16/2023-17:14:03] [E] [TRT] 4: model.0.conv: count of 1080 weights in kernel, but kernel dimensions (3,3) with 3 input channels, 32 output channels and 1 groups were specified. Expected Weights count is 3 * 3*3 * 32 / 1 = 864
[02/16/2023-17:14:03] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::28] Error Code 4: Internal Error (model.0.conv: number of kernel weights does not match tensor dimensions)
[02/16/2023-17:14:03] [E] [TRT] 3: model.0.conv:kernel weights has count 1080 but 864 was expected
[02/16/2023-17:14:03] [E] [TRT] 4: model.0.conv: count of 1080 weights in kernel, but kernel dimensions (3,3) with 3 input channels, 32 output channels and 1 groups were specified. Expected Weights count is 3 * 3*3 * 32 / 1 = 864
[02/16/2023-17:14:03] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::28] Error Code 4: Internal Error (model.0.conv: number of kernel weights does not match tensor dimensions)
[02/16/2023-17:14:03] [E] [TRT] 3: model.0.conv:kernel weights has count 1080 but 864 was expected
[02/16/2023-17:14:03] [E] [TRT] 4: model.0.conv: count of 1080 weights in kernel, but kernel dimensions (3,3) with 3 input channels, 32 output channels and 1 groups were specified. Expected Weights count is 3 * 3*3 * 32 / 1 = 864
[02/16/2023-17:14:03] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::28] Error Code 4: Internal Error (model.0.conv: number of kernel weights does not match tensor dimensions)
[02/16/2023-17:14:03] [E] [TRT] 3: model.0.conv:kernel weights has count 1080 but 864 was expected
[02/16/2023-17:14:03] [E] [TRT] 4: model.0.conv: count of 1080 weights in kernel, but kernel dimensions (3,3) with 3 input channels, 32 output channels and 1 groups were specified. Expected Weights count is 3 * 3*3 * 32 / 1 = 864
[02/16/2023-17:14:03] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::28] Error Code 4: Internal Error (model.0.conv: number of kernel weights does not match tensor dimensions)
[02/16/2023-17:14:03] [E] [TRT] 3: model.0.conv:kernel weights has count 1080 but 864 was expected
[02/16/2023-17:14:03] [E] [TRT] 4: model.0.conv: count of 1080 weights in kernel, but kernel dimensions (3,3) with 3 input channels, 32 output channels and 1 groups were specified. Expected Weights count is 3 * 3*3 * 32 / 1 = 864
[02/16/2023-17:14:03] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::28] Error Code 4: Internal Error (model.0.conv: number of kernel weights does not match tensor dimensions)
[02/16/2023-17:14:03] [E] [TRT] 3: model.0.conv:kernel weights has count 1080 but 864 was expected
[02/16/2023-17:14:03] [E] [TRT] 4: model.0.conv: count of 1080 weights in kernel, but kernel dimensions (3,3) with 3 input channels, 32 output channels and 1 groups were specified. Expected Weights count is 3 * 3*3 * 32 / 1 = 864
[02/16/2023-17:14:03] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::28] Error Code 4: Internal Error (model.0.conv: number of kernel weights does not match tensor dimensions)
[02/16/2023-17:14:03] [E] [TRT] 3: model.0.conv:kernel weights has count 1080 but 864 was expected
[02/16/2023-17:14:03] [E] [TRT] 4: model.0.conv: count of 1080 weights in kernel, but kernel dimensions (3,3) with 3 input channels, 32 output channels and 1 groups were specified. Expected Weights count is 3 * 3*3 * 32 / 1 = 864
[02/16/2023-17:14:03] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::28] Error Code 4: Internal Error (model.0.conv: number of kernel weights does not match tensor dimensions)
[02/16/2023-17:14:03] [E] [TRT] 3: model.0.conv:kernel weights has count 1080 but 864 was expected
[02/16/2023-17:14:03] [E] [TRT] 4: model.0.conv: count of 1080 weights in kernel, but kernel dimensions (3,3) with 3 input channels, 32 output channels and 1 groups were specified. Expected Weights count is 3 * 3*3 * 32 / 1 = 864
[02/16/2023-17:14:03] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::28] Error Code 4: Internal Error (model.0.conv: number of kernel weights does not match tensor dimensions)
[02/16/2023-17:14:03] [E] [TRT] 3: [network.cpp::addScale::616] Error Code 3: Internal Error (Parameter check failed at: optimizer/api/network.cpp::addScale::616, condition: shift.count > 0 ? (shift.values != nullptr) : (shift.values == nullptr)
)
yolov7: /home/john/Documents/tensorrtx/yolov7/src/block.cpp:81: nvinfer1::IScaleLayer* addBatchNorm2d(nvinfer1::INetworkDefinition*, std::map<std::__cxx11::basic_string<char>, nvinfer1::Weights>&, nvinfer1::ITensor&, std::__cxx11::string, float): Assertion `scale_1' failed.
Aborted

Please if you can let me know what I am missing. Thanks

wang-xinyu commented 1 year ago

We only support the v0.1 branch of yolov7, means you can only use the .yaml in this folder https://github.com/WongKinYiu/yolov7/tree/v0.1/cfg/deploy

abhinavrawat27 commented 1 year ago

Hi, do you mean while training, the yamil which I used cfg/training/yolov7-custom.yaml should be used from v0.1 branch?

wang-xinyu commented 1 year ago

No, you can only use the yaml files in this folder https://github.com/WongKinYiu/yolov7/tree/v0.1/cfg/deploy

If you have a custom model, and the model structure is different, you need to adapt the model definitions in C++ code as well https://github.com/wang-xinyu/tensorrtx/blob/f92dcf43dcbe346c357edfa4cc976eb9d0d95470/yolov7/src/model.cpp#L1775

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.