pjreddie / darknet

Convolutional Neural Networks
http://pjreddie.com/darknet/
Other
25.84k stars 21.33k forks source link

An Issue About Training Custom Model #2558

Closed Cagan55 closed 2 years ago

Cagan55 commented 2 years ago

Hi, when I'm trying to train a custom model I'm facing an error about CUDNN. I couldn't understand the error and what should I do about it. This is the command I have run on cmd. darknet.exe detector train data/obj.data cfg/yolov4-obj.cfg yolov4.conv.137 -dont_show This is the log:

CUDA-version: 11000 (11070), cuDNN: 8.4.1, CUDNN_HALF=1, GPU count: 1 CUDNN_HALF=1 OpenCV version: 4.5.2 yolov4-obj 0 : compute_capability = 860, cudnn_half = 1, GPU: NVIDIA GeForce RTX 3060 Laptop GPU net.optimized_memory = 0 mini_batch = 1, batch = 64, time_steps = 1, train = 1 layer filters size/strd(dil) input output 0 Create CUDA-stream - 0 Create cudnn-handle 0

conv 32 3 x 3/ 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF 1 conv 64 3 x 3/ 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF 2 conv 64 1 x 1/ 1 208 x 208 x 64 -> 208 x 208 x 64 0.354 BF 3 route 1 -> 208 x 208 x 64 4 conv 64 1 x 1/ 1 208 x 208 x 64 -> 208 x 208 x 64 0.354 BF 5 conv 32 1 x 1/ 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BF 6 conv 64 3 x 3/ 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF 7 Shortcut Layer: 4, wt = 0, wn = 0, outputs: 208 x 208 x 64 0.003 BF 8 conv 64 1 x 1/ 1 208 x 208 x 64 -> 208 x 208 x 64 0.354 BF 9 route 8 2 -> 208 x 208 x 128 10 conv 64 1 x 1/ 1 208 x 208 x 128 -> 208 x 208 x 64 0.709 BF 11 conv 128 3 x 3/ 2 208 x 208 x 64 -> 104 x 104 x 128 1.595 BF 12 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 13 route 11 -> 104 x 104 x 128 14 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 15 conv 64 1 x 1/ 1 104 x 104 x 64 -> 104 x 104 x 64 0.089 BF 16 conv 64 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 64 0.797 BF 17 Shortcut Layer: 14, wt = 0, wn = 0, outputs: 104 x 104 x 64 0.001 BF 18 conv 64 1 x 1/ 1 104 x 104 x 64 -> 104 x 104 x 64 0.089 BF 19 conv 64 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 64 0.797 BF 20 Shortcut Layer: 17, wt = 0, wn = 0, outputs: 104 x 104 x 64 0.001 BF ...

Total BFLOPS 59.563 avg_outputs = 489778 Allocate additional workspace_size = 79.04 MB Loading weights from yolov4.conv.137... seen 64, trained: 0 K-images (0 Kilo-batches_64) Done! Loaded 137 layers from weights-file Learning Rate: 0.001, Momentum: 0.949, Decay: 0.0005 Detection layer: 139 - type = 28 Detection layer: 150 - type = 28 Detection layer: 161 - type = 28 Resizing, random_coef = 1.40

608 x 608 Create 6 permanent cpu-threads try to allocate additional workspace_size = 81.03 MB CUDA allocate done! Loaded: 0.011000 seconds v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 139 Avg (IOU: 0.000000), count: 1, class_loss = 4821.158691, iou_loss = 0.000000, total_loss = 4821.158691 v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 150 Avg (IOU: 0.000000), count: 1, class_loss = 1261.181641, iou_loss = 0.000000, total_loss = 1261.181641 v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 161 Avg (IOU: 0.416626), count: 1, class_loss = 328.948456, iou_loss = 0.013824, total_loss = 328.962280

**cuDNN status Error in: file: C:/Darknet/darknet-master/darknet-master/src/convolutional_kernels.cu : backward_convolutional_layer_gpu() : line: 854 : build time: Jul 7 2022 - 01:31:44

cuDNN Error: CUDNN_STATUS_INTERNAL_ERROR Darknet error location: C:\Darknet\darknet-master\darknet-master\src\dark_cuda.c, cudnn_check_error, line #204 cuDNN Error: CUDNN_STATUS_INTERNAL_ERROR: No error**