Open Jakub-Svoboda opened 6 years ago
Hi,
the image path is not the same as the one described in your data file
train = /home/kuba/yolo/darknet/dataset/train.txt
check the path in your train.txt
The paths were set correctly. I have actually fixed this error by renaming all the .jpeg files to .jpg
@Jakub-Svoboda does that work by fixing .jpeg files to jpg? i have the same problems with the same dataset.....
@FreeKingofNature Yes, it did work for me. I renamed all the files to .jpg and changed the paths in the train.txt file appropriately.
@Jakub-Svoboda Some region like 94,82,106,will show “-nan“. Have you ever been in this situation? Ps: I use the same dataset.....
Region 94 Avg IOU: 0.806114, Class: 0.999890, Obj: 0.996381, No Obj: 0.000783, .5R: 1.000000, .75R: 1.000000, count: 2
Region 106 Avg IOU: 0.850631, Class: 0.999968, Obj: 0.979259, No Obj: 0.000066, .5R: 1.000000, .75R: 1.000000, count: 1
Region 82 Avg IOU: 0.835386, Class: 0.999710, Obj: 0.998900, No Obj: 0.011948, .5R: 1.000000, .75R: 0.875000, count: 8
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000793, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000000, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.812951, Class: 0.999933, Obj: 0.999951, No Obj: 0.002752, .5R: 1.000000, .75R: 0.500000, count: 2
Region 94 Avg IOU: 0.791764, Class: 0.999959, Obj: 0.956886, No Obj: 0.000952, .5R: 1.000000, .75R: 1.000000, count: 2
Region 106 Avg IOU: 0.746094, Class: 0.999896, Obj: 0.812066, No Obj: 0.000048, .5R: 1.000000, .75R: 0.000000, count: 1
Region 82 Avg IOU: 0.872028, Class: 0.999869, Obj: 0.999937, No Obj: 0.001653, .5R: 1.000000, .75R: 1.000000, count: 1
Region 94 Avg IOU: 0.859089, Class: 0.999578, Obj: 0.798197, No Obj: 0.001350, .5R: 1.000000, .75R: 1.000000, count: 5
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000000, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.871236, Class: 0.999982, Obj: 0.999019, No Obj: 0.002857, .5R: 1.000000, .75R: 1.000000, count: 3
Region 94 Avg IOU: 0.838271, Class: 0.998296, Obj: 0.991687, No Obj: 0.000716, .5R: 1.000000, .75R: 1.000000, count: 1
Region 106 Avg IOU: 0.784796, Class: 0.999965, Obj: 0.995162, No Obj: 0.000059, .5R: 1.000000, .75R: 1.000000, count: 1
Region 82 Avg IOU: 0.856737, Class: 0.999850, Obj: 0.997983, No Obj: 0.006347, .5R: 1.000000, .75R: 1.000000, count: 6
Region 94 Avg IOU: 0.636101, Class: 0.999767, Obj: 0.582711, No Obj: 0.000054, .5R: 1.000000, .75R: 0.000000, count: 1
Region 106 Avg IOU: 0.208812, Class: 0.999890, Obj: 0.430457, No Obj: 0.000025, .5R: 0.000000, .75R: 0.000000, count: 1
Region 82 Avg IOU: 0.882747, Class: 0.999512, Obj: 0.994097, No Obj: 0.004860, .5R: 1.000000, .75R: 1.000000, count: 3
Region 94 Avg IOU: 0.843018, Class: 0.999823, Obj: 0.998724, No Obj: 0.001323, .5R: 1.000000, .75R: 1.000000, count: 3
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000000, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000014, .5R: -nan, .75R: -nan, count: 0
Region 94 Avg IOU: 0.763332, Class: 0.999409, Obj: 0.798982, No Obj: 0.001761, .5R: 1.000000, .75R: 0.714286, count: 7
Nans are fine as long as they only show up when the count is 0. This just means that this batch of images doesn't have any objects that show up at that particular scale (note it often happens in Layer 106, which deals with the smallest objects). Thus when it tries to calculate averages, it divides by 0 and gets nan. Not a problem.
When nans are a problem is when training goes off the rails, but then your whole screen will be full of them.
@pjreddie Thank you for your reply. I got it! Your work is amazing! ^_^
I have tried training YOLOv3 on the pascal VOC dataset and the training went fine, so now I am trying to train YOLOv3 on my custom class. I have a dataset in VOC format for which I have generated the labels with the script provided. Now I have all the images in /JPEGImages/ folder and all the generated annotations files in the /labels/ folder. I have created new config file called "yolov3-head.cfg" and I have changed the number of classes in each of the yolo layer to 1 and the number of filters in the layer above each yolo layer to 18. I have set the batch=64 and subdivisions=16. I have also created new cfg/head.data file which looks like this:
classes= 1 train = /home/kuba/yolo/darknet/dataset/train.txt valid = /home/kuba/yolo/darknet/dataset/val.txt names = data/head.names backup = backup
and also a new data/head.names which contains a single class name: head.
My train.txt file contains full paths to the individual images:
/home/kuba/yolo/darknet/dataset/VOCdevkit/VOC2007/JPEGImages/mov_001_007585.jpeg /home/kuba/yolo/darknet/dataset/VOCdevkit/VOC2007/JPEGImages/mov_001_007587.jpeg /home/kuba/yolo/darknet/dataset/VOCdevkit/VOC2007/JPEGImages/mov_001_007589.jpeg ...
The problem arises when I start the training with:
./darknet detector train cfg/head.data cfg/yolov3-head.cfg darknet53.conv.74
For some reason darknet is looking for the .jpeg files in the /labels/ folder and I receive this error:
Couldn't open file: /home/kuba/yolo/darknet/dataset/VOCdevkit/VOC2007/labels/mov_001_130285.jpeg Couldn't open file: /home/kuba/yolo/darknet/dataset/VOCdevkit/VOC2007/labels/mov_013_117936.jpeg Couldn't open file: /home/kuba/yolo/darknet/dataset/VOCdevkit/VOC2007/labels/mov_001_156342.jpeg
I have tried copying every .jpeg file from the /JPEGImages/ to the /labels/ folder, but then the training numbers are all -nan:
Loaded: 0.589299 seconds Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.484583, .5R: -nan, .75R: -nan, count: 0 Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.493059, .5R: -nan, .75R: -nan, count: 0
It is surely the problem of "JPEG" in "src/data.c" for me. Yolov3 only has the following lines. "find_replace(labelpath, "JPEGImages", "labels", labelpath); find_replace(labelpath, ".jpg", ".txt", labelpath); find_replace(labelpath, ".JPG", ".txt", labelpath); find_replace(labelpath, ".JPEG", ".txt", labelpath);" So for ".jpeg", ".png", ".PNG", it doesn't has the corresponding function call so will generate wrong label name. Adding find_replace(labelpath, ".jpeg", ".txt", labelpath); find_replace(labelpath, ".png", ".txt", labelpath); find_replace(labelpath, ".PNG", ".txt", labelpath); for each place in "src/data.c" solves the problem for me.
thanks, i use png images and the training was always nan, your post solves my rpoblem
I have tried training YOLOv3 on the pascal VOC dataset and the training went fine, so now I am trying to train YOLOv3 on my custom class. I have a dataset in VOC format for which I have generated the labels with the script provided. Now I have all the images in /JPEGImages/ folder and all the generated annotations files in the /labels/ folder. I have created new config file called "yolov3-head.cfg" and I have changed the number of classes in each of the yolo layer to 1 and the number of filters in the layer above each yolo layer to 18. I have set the batch=64 and subdivisions=16. I have also created new cfg/head.data file which looks like this:
and also a new data/head.names which contains a single class name: head.
My train.txt file contains full paths to the individual images:
The problem arises when I start the training with:
For some reason darknet is looking for the .jpeg files in the /labels/ folder and I receive this error:
I have tried copying every .jpeg file from the /JPEGImages/ to the /labels/ folder, but then the training numbers are all -nan: