Training Accuracy starts dropping after 150k iterations (coco+pascal dataset)

pjspillai commented 5 years ago

Hi, I have been training yolo3 on the COCO+Pascal dataset (on a subset of 6 classes, out of the 80). I tried on two different experiment setups: a ) GTX 1070 b) Two GTX 1080 Ti's Both the experiments show a drop in the overall mAP score as well as the individual Pr Recall Scores for the 5 classes

I changed the yolov3.cfg file with the new filters at the appropriate places. Any idea on why this might be occurring?

[convolutional]
size=1
stride=1
pad=1
filters=33
activation=linear

[yolo]
mask = 0,1,2
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
classes=6
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

AlexeyAB commented 5 years ago

@pjspillai Hi,

What mAP did you get before 150k and after 150k?
What soft/script did you use to measure accuracy?
Can you reneame your cfg-file to txt-file and drag-n-drop it to your message?
Did you check that you correctly have merged 2 datasets into 1?
Did you train for 5 or 6 classes?

I have been training yolo3 on the COCO+Pascal dataset (on a subset of 5 classes, out of the 80).

[yolo] mask = 0,1,2 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes=6

pjspillai commented 5 years ago

Hi Alexey,

What mAP did you get before 150k and after 150k? 1000: 7.94 10000: 36.8 20000: 45.69 100000: 47.3 150000: 37.28 200000: 43.84 300000: 46.98 350000: 37.20

(It keeps on fluctuating up/down with every 50k iterations)

What soft/script did you use to measure accuracy? I actually used your fork of the darknet for the mAP score computation.
- Can you rename your cfg-file to txt-file and drag-n-drop it to your message? Attached yolov3.txt
- Did you check that you correctly have merged 2 datasets into 1? I checked them yes.
- Did you train for 5 or 6 classes? I trained for 6 classes [person, bicycle, bus, car, motorbike, truck]

@pjspillai Hi,

What soft/script did you use to measure accuracy?

Can you reneame your cfg-file to txt-file and drag-n-drop it to your message?

Did you check that you correctly have merged 2 datasets into 1?

Did you train for 5 or 6 classes?

I have been training yolo3 on the COCO+Pascal dataset (on a subset of 5 classes, out of the 80).

[yolo] mask = 0,1,2 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes=6

AlexeyAB commented 5 years ago

@pjspillai

What validation dataset do you use?

Is it the same as training dataset?
Or is it combined PascalVOC valid dataset + MS COCO valid dataset?

Why did you set batch=16 subdivisions=8 instead of, for example, batch=64 subdivisions=32?

pjspillai commented 5 years ago

What validation dataset do you use? It's the subset of the COCO 5k val images having the 6 classes of interest + the Pascal subset
Why did you set batch=16 subdivisions=8 instead of, for example, batch=64 subdivisions=32? It gave me cuda memory error on the 1070 GPU. So I lowered the batch/subdivision size for the training to work. I am re-running the training with the batch=64,subdivision=32 for the multi-gpu and it's training so far without any cuda errors. Will wait and see how this performs.

Not sure if the batch/subdivision size will affect the mAP score significantly(though ideally it's supposed to).

@pjspillai

What validation dataset do you use?

Is it the same as training dataset?

Or is it combined PascalVOC valid dataset + MS COCO valid dataset?

Why did you set batch=16 subdivisions=8 instead of, for example, batch=64 subdivisions=32?

pjreddie / darknet

Training Accuracy starts dropping after 150k iterations (coco+pascal dataset) #1467