yolov5: Assertion scale_1 failed

nwself commented 3 years ago

Env

GPU TX2
OS Ubuntu 18.04
Cuda 10.2.89
TensorRT 7.1.3.0

About this repo

Same behavior at yolov5-v4.0 tag and current master which is at commit f8c537586a86347a4b82c19a59edddf6438744a7
yolov5

Your problem

We have generated a .wts file elsewhere with 11 classes.
We update yololayer.h to CLASSNUM = 11
We run sudo ./yolov5 -s model.wts yolov5s.engine s

Output is:

Loading weights: ./model.wts
[06/11/2021-11:12:25] [E] [TRT] Parameter check failed at: ../builder/Network.cpp::addScale::482, condition: shift.count > 0 ? (shift.values != nullptr) : (shift.values == nullptr)
yolov5: /home/nvidia/clones/v4/tensorrtx/yolov5/common.hpp:155: nvinfer1::IScaleLayer* addBatchNorm2d(nvinfer1::INetworkDefinition*, std::map<std::__cxx11::basic_string<char>, nvinfer1::Weights>&, nvinfer1::ITensor&, std::__cxx11::string, float): Assertion `scale_1' failed.
Aborted

Expected a `.engine file to be generated

I'm not sure what to make of this crash. Is there any good way to debug it?

Alex-Beh commented 3 years ago

May I know how you trained the model? Did you start with the pretrained weight in yolov5? Which command you run to train the yolov5 with 11 classes?

nwself commented 3 years ago

Yes, we use pretrained weight provided in yolov5

The command to train

python train.py \
--batch-size 64 \
--data data/merged_bdd100k_license.yaml \
--cfg models/merged_bdd100K_license/yolov5s.yaml \
--weights weights/yolov5s.pt \
--hyp data/hyp.finetune.yaml \
--name merged_5s

The data/merged_bdd100k_license.yaml file

# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
train: /dataset/merged_bdd100k_license/images/train/  # 74000 images
val: /dataset/merged_bdd100k_license/images/val/  # 10178 images
# number of classes
nc: 11
# class names
names: ['bike', 'bus', 'car', 'motor', 'person', 'rider', 'traffic light', 'traffic sign', 'train', 'truck', 'license']

The models/merged_bdd100K_license/yolov5s.yaml file

# parameters
nc: 11  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
# anchors
anchors:
- [10,13, 16,30, 33,23]  # P3/8
- [30,61, 62,45, 59,119]  # P4/16
- [116,90, 156,198, 373,326]  # P5/32
# YOLOv5 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Focus, [64, 3]],  # 0-P1/2
[-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
[-1, 3, BottleneckCSP, [128]],
[-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
[-1, 9, BottleneckCSP, [256]],
[-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
[-1, 9, BottleneckCSP, [512]],
[-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
[-1, 1, SPP, [1024, [5, 9, 13]]],
[-1, 3, BottleneckCSP, [1024, False]],  # 9
]
# YOLOv5 head
head:
[[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]],  # cat backbone P4
[-1, 3, BottleneckCSP, [512, False]],  # 13
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]],  # cat backbone P3
[-1, 3, BottleneckCSP, [256, False]],  # 17 (P3/8-small)
[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]],  # cat head P4
[-1, 3, BottleneckCSP, [512, False]],  # 20 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]],  # cat head P5
[-1, 3, BottleneckCSP, [1024, False]],  # 23 (P5/32-large)
[[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
]

Alex-Beh commented 3 years ago

@nwself I didn't pass in the cfg and hyp file. Maybe you try use the following command and train again:

python train.py \
--batch-size 64 \
--data data/merged_bdd100k_license.yaml \
--weights weights/yolov5s.pt \
--name merged_5s

lucky2046 commented 3 years ago

Any progress？

nrpr161293 commented 3 years ago

I have exactly the same error, and the environment setup is similar!

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

DavidBaldsiefen commented 2 years ago

Same issue here, any news?

Edit: Fixed it by checking out the correct branch, in my case I was using yolov5m V5 and hat to switch to the according branch on tensorrtx

wang-xinyu commented 12 months ago

Same issue here, any news?

Edit: Fixed it by checking out the correct branch, in my case I was using yolov5m V5 and hat to switch to the according branch on tensorrtx

Yes, please use the correct version of yolov5 and tensorrtx-yolov5.

wang-xinyu / tensorrtx

yolov5: Assertion scale_1 failed #591

Env

About this repo

Your problem