Closed thuanvh closed 8 years ago
Hi!
I think that the problem could be anywhere.
Martin
Hi @martinkersner
I use the crfasrnn trained network to segment my images into 2 classes (background and foreground). My data is scaled between -1 and 1. Input size is 250x250 instead of 500x500. Then I train data by customizing train_val.prototxt. I change num_output of some convolution layers from 21 to 2. I add weight filler into each Convolution and Deconvolution parameter
weight_filler {
type: "xavier"
std: 0.1
}
bias_filler {
type: "constant"
value: 0.2
}
Here my output:
I0120 17:01:35.764663 7444 solver.cpp:473] Iteration 10000, lr = 1e-010
I0120 17:04:47.148594 7444 solver.cpp:213] Iteration 10100, loss = 51121.4
I0120 17:04:47.159598 7444 solver.cpp:228] Train net output #0: loss = 59859.5 (* 1 = 59859.5 loss)
I0120 17:04:47.161098 7444 solver.cpp:473] Iteration 10100, lr = 1e-010
I0120 17:08:00.796912 7444 solver.cpp:213] Iteration 10200, loss = 52975.5
I0120 17:08:00.798411 7444 solver.cpp:228] Train net output #0: loss = 55693 (* 1 = 55693 loss)
I0120 17:08:00.799911 7444 solver.cpp:473] Iteration 10200, lr = 1e-010
I0120 17:11:11.810928 7444 solver.cpp:213] Iteration 10300, loss = 52339.4
I0120 17:11:11.825934 7444 solver.cpp:228] Train net output #0: loss = 50822.4 (* 1 = 50822.4 loss)
I0120 17:11:11.834436 7444 solver.cpp:473] Iteration 10300, lr = 1e-010
I0120 17:14:22.996568 7444 solver.cpp:213] Iteration 10400, loss = 51125.5
I0120 17:14:22.998069 7444 solver.cpp:228] Train net output #0: loss = 63236 (* 1 = 63236 loss)
I0120 17:14:23.001070 7444 solver.cpp:473] Iteration 10400, lr = 1e-010
I0120 17:17:36.604817 7444 solver.cpp:213] Iteration 10500, loss = 51211.2
I0120 17:17:36.608319 7444 solver.cpp:228] Train net output #0: loss = 38913 (* 1 = 38913 loss)
I0120 17:17:36.613320 7444 solver.cpp:473] Iteration 10500, lr = 1e-010
I0120 17:20:48.551285 7444 solver.cpp:213] Iteration 10600, loss = 51850.9
I0120 17:20:48.580294 7444 solver.cpp:228] Train net output #0: loss = 46278.5 (* 1 = 46278.5 loss)
I0120 17:20:48.581795 7444 solver.cpp:473] Iteration 10600, lr = 1e-010
I0120 17:23:59.443377 7444 solver.cpp:213] Iteration 10700, loss = 50562.2
I0120 17:23:59.444877 7444 solver.cpp:228] Train net output #0: loss = 50714.7 (* 1 = 50714.7 loss)
I0120 17:23:59.445876 7444 solver.cpp:473] Iteration 10700, lr = 1e-010
I0120 17:27:09.559470 7444 solver.cpp:213] Iteration 10800, loss = 50792.1
I0120 17:27:09.560971 7444 solver.cpp:228] Train net output #0: loss = 53939.8 (* 1 = 53939.8 loss)
I0120 17:27:09.562472 7444 solver.cpp:473] Iteration 10800, lr = 1e-010
I0120 17:30:19.621381 7444 solver.cpp:213] Iteration 10900, loss = 50344.1
I0120 17:30:19.622881 7444 solver.cpp:228] Train net output #0: loss = 67609.6 (* 1 = 67609.6 loss)
"Data is scaled between -1 and 1." What data you mean, images or labels?
I don't understand why you scale your data between -1 and 1. Even though this does not have to affect (I guess) your training, common practice is to have images within range 0-255 (and subtract their mean during training). Labels are usually denoted as integers in range 0-N, where N is number of classes - 1.
Because you just use weight filler and not weights already obtained by fcn-8 or crfasrnn, training is certainly going to take long time. Static (class 0) output of your network is likely to be caused by wrong weights initialization.
You can check my repo https://github.com/martinkersner/train-CRF-RNN.
I scale only images. Labels are 0 and 1.
I tried to use crfasrnn weights as in this file train_val.prototxt.txt
I compare it with your file, it is the same except the weight_filler. Don't you use filler for the new convolution layers?
I tried to train without fillers, just with weights from crfasrnn and what you can see in solve.py. Anyway, I should try it with them, because after 35 thousand of iterations it seems to me that I get worse results, however the loss is slightly decreasing.
If you still have problem with your unchanging predictions, don't train more than 500 iterations. I get pretty reasonable results even at such early training. Personally, I would guess that you have some problem with your training data.
After review the solve.py. I think the problem is that I did not initialize weights for Deconvolution layers as in solve.py. I used only caffe build for training, not used solve.py. Now I will try to use it. Thank you so much, Thuan
@thuanvh Hi, have you solved your problem? And how about the accuracy for two classes (i.e. background and people) labeling?
The problem is solved. I am collecting data for training. I have no measure now.
@thuanvh I have the same problem and I know it has been a long time from this post, but may I know how did you solve the problem? I have medical images(CT) and I want to use this for segmentation of tumor so I have just two class of background and tumor, After training all the predictions from the network is black.
Hi all, I am trying to train my own images with crfasrnn. My classes are background and people. But my training does not converge, the loss values do not decrease. And the prediction output of my network is only background (class 0) for all image. I think that I have a mistake in my training. Have you ever met the similar case? Could you give me any suggestion?
Thank you, Thuan