Training results loss: nan avg loss: nan

LeeroyHannigan commented 6 years ago

Hi all, Apologies, this is my first time using object detection of any sort, and in fact my first time using python. While trying to train my data set I am getting results as nan. The error i receive is: RuntimeWarning: invalid value encountered in sqrt obj[4] = np.sqrt(obj[4])

Sample annotation xml file

<annotation>
  <folder>images</folder>
  <filename>000006.png</filename>
  <segmented>0</segmented>
  <size>
    <width>225</width>
    <height>225</height>
    <depth>3</depth>
  </size>
  <object>
    <name>person</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>35</xmin>
      <ymin>93</ymin>
      <xmax>45</xmax>
      <ymax>45</ymax>
    </bndbox>
  </object>
  <object>
    <name>person</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>69</xmin>
      <ymin>94</ymin>
      <xmax>77</xmax>
      <ymax>77</ymax>
    </bndbox>
  </object>
  <object>
    <name>person</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>129</xmin>
      <ymin>85</ymin>
      <xmax>136</xmax>
      <ymax>136</ymax>
    </bndbox>
  </object>
  <object>
    <name>person</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>145</xmin>
      <ymin>98</ymin>
      <xmax>153</xmax>
      <ymax>153</ymax>
    </bndbox>
  </object>
  <object>
    <name>person</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>81</xmin>
      <ymin>143</ymin>
      <xmax>92</xmax>
      <ymax>92</ymax>
    </bndbox>
  </object>
  <object>
    <name>person</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>185</xmin>
      <ymin>188</ymin>
      <xmax>195</xmax>
      <ymax>195</ymax>
    </bndbox>
  </object>
  <object>
    <name>person</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>77</xmin>
      <ymin>189</ymin>
      <xmax>93</xmax>
      <ymax>93</ymax>
    </bndbox>
  </object>
  <object>
    <name>person</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>75</xmin>
      <ymin>204</ymin>
      <xmax>89</xmax>
      <ymax>89</ymax>
    </bndbox>
  </object>
</annotation>

cfg file

[net]
batch=64
subdivisions=8
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
max_batches = 40100
policy=steps
steps=-1,100,20000,30000
scales=.1,10,.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=30
activation=linear

[region]
anchors = 1.08,1.19,  3.42,4.41,  6.63,11.38,  9.42,5.11,  16.62,10.52
bias_match=1
classes=1
coords=4
num=5
softmax=1
jitter=.2
rescore=1

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .5
random=1

I hope someone can point me in the right direction. Its the first time i've tried training my own model. Thanks, Lee

leadcain84 commented 6 years ago

If the train uses your custom dataset, and it's initial training. you should be use 'burn_in' because gradient and loss are very unstable in the initial train. check the code and search 'seen' in https://github.com/marvis/pytorch-yolo2/blob/master/train.py

Salt-wmx commented 6 years ago

i have the same questions... Do you had some idea about this?

kribby commented 6 years ago

This script apparently helps you identify when the Nans start to occur

https://gist.github.com/yuq-1s/ce63a306f1d39d1c0c80d33f7855f3b5

But I am not sure how to use it with darkflow - if you work it out please let me know

naelabdeljawad commented 5 years ago

You are getting nan cuz the Ymin is bigger than Ymax:

<bndbox>
      <xmin>69</xmin>
      <ymin>94</ymin>
      <xmax>77</xmax>
      <ymax>77</ymax>
</bndbox>

Make sure that Min is smaller than Max for both X and Y.

thtrieu / darkflow

Training results loss: nan avg loss: nan #697