pjreddie / darknet

Convolutional Neural Networks
http://pjreddie.com/darknet/
Other
25.85k stars 21.33k forks source link

My loss immediately dropped to zero and weights can't work #2225

Open jack04060201 opened 4 years ago

jack04060201 commented 4 years ago

Hello , I'm training on colab but my chart look weird. chart When I start training my loss drop very fast at hundred iteration. So I check my cfg,data,names,txt so many times. But it still not work. My Config

[net]
# Testing
batch=64
subdivisions=64
# Training
# batch=64
# subdivisions=2
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 4000
policy=steps
steps=400000,450000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=33
activation=linear

[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=6
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 8

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=33
activation=linear

[yolo]
mask = 0,1,2
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=6
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
akrawat912 commented 4 years ago

No idea about to specific, but you are using max_batches 4000 for 6 classes? Try with 24000. [Classes*1000]

goga1902 commented 4 years ago

Firstly. Why are you training 6 classes for 4000 batches? One class is trained for at least 2000 batches. So you should have 6 classes * 2000 batches = 12000 batches. The steps should be 80 and 90 percent of the number of batches, which means 9600 and 10800.

learning_rate=0.001 burn_in=1000 max_batches = 12000 policy=steps steps=9600,10800 scales=.1,.1