zhaoweicai / mscnn

Caffe implementation of our multi-scale object detection framework
405 stars 211 forks source link

Training on own dataset : nan values #53

Closed MSutt closed 7 years ago

MSutt commented 7 years ago

Hi,

First of all, thanks for your work. I tried to train the kitty_car/mscnn-7s-576-2x example and all was ok. Then i tried to train the network on my own dataset and some "nan" values appeared. Here are my log files, log_1st.txt and log_2nd.txt. I noticed that in the first step, the nan values appears just for the testing phase (each 1000 iterations) on same outputs 9, 11, 13, 18, 19, 20. For the second step, new outputs are constantly nan, the loss value at each iterations, the training net output 38 and test net outputs 14, 15,38.

I tried to reduce learning rate, i also tried to reduce testing data to 200 images. It didn't solve the problem Could anybody help me to find the error?

zhaoweicai commented 7 years ago

From log_1st.txt, there is no "nan", which menas the training works very well. "nan" only exist in the layer of "3_5x5", "3_7x7" and "4_5x5". Probably, there is no "positive" data in the test set for those three branches. You need check that. And in log_2nd.txt, the training diverges at the beginning. You may consider to lower the learning rate. It might help.

MSutt commented 7 years ago

Thanks, I reduced the learning rate of the second phase to 5e-06 and it worked. I still have the nan values in the layers of "3_5x5", "3_7x7" and "4_5x5" but you are right, this is because i don't have data corresponding to those cases.

moyans commented 7 years ago

@MSutt Can you help me? please . on my own dataset(like VOC dataset), the window files is like this:

0

/home/ky/moyan/VOC0712/JPEGImages/100360.jpg 3 1264 1600 3 1 1 65 66 1141 1026 3 1 171 652 1054 945 2 1 175 263 1050 627 0

1

/home/ky/moyan/VOC0712/JPEGImages/100961.jpg 3 1264 1600 3 1 1 567 53 1547 1138 3 1 678 736 1444 988 2 1 670 437 1440 718 0

I want to know the 'ignor' always is 1 ? and my own set dont have 'region of non-interest windows ' so the 'num_roni' always is 0 ?