hx173149 / C3D-tensorflow

C3D is a modified version of BVLC tensorflow to support 3D ConvNets.
MIT License
588 stars 262 forks source link

After some steps while training, the weights and loss value will be NaN . #56

Open zeynepgokce opened 6 years ago

zeynepgokce commented 6 years ago

Hello everyone,

I have a question about training this model with different dataset.

When i finetune the c3d model with UCF101 data, there is no problem. But when i change the dataset i have got this error that loss is Nan Value.

I tried some ways to handle this problem which did not solve it. 1) changed the learning rate 2) changed batch size 3) tried with small test and train split ( with #num : 3)

For instance, these are steps of training same model with different dataset. Learning rates and batch size are same with original as this model. Just dataset is changed.

('Step : ', 31)
------------------------------------------------------------------
 TRAIN DATA READING  ...
Training Data Eval:
accuracy: 0.00000
(' Loss : ', array([ 2.60935736,  2.62928033,  2.65052104,  2.6719377 ,  2.69320059,
        2.7358048 ,  2.73551226,  2.73502755,  3.95449066,  3.61877584,
        2.61726952,  2.60790229,  2.60790229,  2.60790229,  2.60790229,
        2.60790229,  2.60790229,  2.60790229,  2.60790229,  2.60790229,
        2.60790229,  2.60790229], dtype=float32))
 TEST DATA READING  ...
Validation Data Eval:
accuracy: 0.00000
('Step : ', 32)
------------------------------------------------------------------
 TRAIN DATA READING  ...
Training Data Eval:
accuracy: 0.10000
(' Loss : ', array([ 3.21029162,  3.2302146 ,  3.25145531,  3.27287173,  3.29413438,
        3.33673644,  3.33644032,  3.33594394,  4.55498886,  4.21940422,
        3.21820426,  3.20883656,  3.20883656,  3.20883656,  3.20883656,
        3.20883656,  3.20883656,  3.20883656,  3.20883656,  3.20883656,
        3.20883656,  3.20883656], dtype=float32))
 TEST DATA READING  ...
Validation Data Eval:
accuracy: 0.20000
('Step : ', 33)
------------------------------------------------------------------
 TRAIN DATA READING  ...
Training Data Eval:
accuracy: 0.30000
(' Loss : ', array([ 2.50483251,  2.52475548,  2.54599619,  2.56741214,  2.58867455,
        2.63127494,  2.63097525,  2.63046718,  3.84910154,  3.51364231,
        2.51274562,  2.50337744,  2.50337744,  2.50337744,  2.50337744,
        2.50337744,  2.50337744,  2.50337744,  2.50337744,  2.50337744,
        2.50337744,  2.50337744], dtype=float32))
 TEST DATA READING  ...
Validation Data Eval:
accuracy: 0.00000
('Step : ', 34)
------------------------------------------------------------------
 TRAIN DATA READING  ...
Training Data Eval:
accuracy: 0.00000
(' Loss : ', array([ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan], dtype=float32))
 TEST DATA READING  ...
Validation Data Eval:
accuracy: 0.00000

Where should be the problem ? Why does not this model work with different dataset ? Any suggestion? Thank you.

zkjqw139 commented 6 years ago

try use labels = tf.one_hot(labels_placeholder, c3d_model.NUM_CLASSES) loss = -tf.reduce_sum(labels*tf.log(tf.clip_by_value(tf.nn.softmax(logit),1e-10,1.0)),1.0) instead

491506870 commented 6 years ago

@zeynepgokce hi, could you tell me how to print the loss, where should i add code? thank you.

zeynepgokce commented 6 years ago

@491506870 Hi, i just added "loss" to session simply, as following

summary, acc,l = sess.run( [merged, accuracy,loss], feed_dict={images_placeholder: train_images, labels_placeholder: train_labels }) print ("accuracy: " + "{:.5f}".format(acc)) print(" Loss : ",l)

491506870 commented 6 years ago

@zeynepgokce thank you so much!!! which helps me a lot~~and i think i met the same problem with you, my loss became NAN after several steps, what did you do to solve it?

zeynepgokce commented 6 years ago

@491506870 , Problem was related to my own Dataset Labelling. I changed my labelling like starting from 0 to 2 since i have 3 classes.