Network always predicts all black mask

jbitton commented 5 years ago

Hi there!

I've been trying to train a u-net using your repo on the kaggle ultrasound nerve dataset. However, no matter what I do, the mask I get is always all zeros. This is how I process the prediction:

mask = np.zeros((prediction.shape[1], prediction.shape[2]), dtype=np.uint8)
for i in range(prediction.shape[1]):
    for j in range(prediction.shape[2]):
        mask[i][j] = np.argmax(prediction[0][i][j])

I've looked through previous issues, and I followed the code available in https://github.com/jakeret/tf_unet/issues/3#issuecomment-260112160. I have also tried to use the current code available in the scripts folder, but I had to modify tf_unet/image_util to load the images as grayscale.

I think that it may always predict no mask is because the network achieves a high enough accuracy doing so. What am I doing wrong here? How can I actually get predictions?

Here are some logs from the end of training:

2018-12-15 23:26:12,042 Iter 3198, Minibatch Loss= 0.4196, Training Accuracy= 0.8665, Minibatch error= 13.3%
2018-12-15 23:26:14,717 Epoch 99, Average loss: 0.5016, learning rate: 0.0010
2018-12-15 23:26:14,791 Verification error= 37.4%, loss= 0.7187
2018-12-15 23:26:20,868 Optimization Finished!
..................
Using data from: /scratch/jtb470/nerve_split_data/train
Number of data_files used: 3060
Testing error rate: 0.00%

Thanks in advance.

jakeret commented 5 years ago

hi @jbitton there might be several reason that the model does not predict anything. Very recently someone found a bug in the _process_data function (#228). Maybe you encounter the same issue. An other thing: if I remember correctly the nerve segmentation data set is highly imbalanced. Hence using average as loss is not very suited. Have you tried to use a different one (e.g. dice_coefficient)

jbitton commented 5 years ago

@jakeret hi, thanks for the response. I added the if statements to protect against division by zero errors and changed my loss function to the dice coefficient. I just restarted training and training accuracy / minibatch loss still seems strange:

2018-12-19 11:17:55,138 Verification error= 100.0%, loss= -0.4902
2018-12-19 11:17:57,601 Start optimization
2018-12-19 11:18:22,779 Iter 0, Minibatch Loss= -0.6293, Training Accuracy= 1.0000, Minibatch error= 0.0%
2018-12-19 11:18:35,344 Iter 1, Minibatch Loss= -1.0000, Training Accuracy= 1.0000, Minibatch error= 0.0%
2018-12-19 11:18:48,181 Iter 2, Minibatch Loss= -1.0000, Training Accuracy= 1.0000, Minibatch error= 0.0%
2018-12-19 11:19:00,780 Iter 3, Minibatch Loss= -1.0000, Training Accuracy= 1.0000, Minibatch error= 0.0%
2018-12-19 11:19:13,417 Iter 4, Minibatch Loss= -1.0000, Training Accuracy= 1.0000, Minibatch error= 0.0%
2018-12-19 11:19:26,560 Iter 5, Minibatch Loss= -0.6047, Training Accuracy= 0.6047, Minibatch error= 39.5%
2018-12-19 11:19:39,370 Iter 6, Minibatch Loss= -1.0000, Training Accuracy= 1.0000, Minibatch error= 0.0%
2018-12-19 11:19:52,543 Iter 7, Minibatch Loss= -1.0000, Training Accuracy= 1.0000, Minibatch error= 0.0%
2018-12-19 11:20:05,163 Iter 8, Minibatch Loss= -1.0000, Training Accuracy= 1.0000, Minibatch error= 0.0%
2018-12-19 11:20:17,818 Iter 9, Minibatch Loss= -1.0000, Training Accuracy= 1.0000, Minibatch error= 0.0%
2018-12-19 11:20:30,857 Iter 10, Minibatch Loss= -1.0000, Training Accuracy= 1.0000, Minibatch error= 0.0%

I highly doubt the accuracy is so good. Anything else you can think of doing?

ChaoLi977 commented 5 years ago

I also got a similar prediction, all of the output is black. The minibatch loss is -1.00. Does anyone know what's wrong with it?

rytisss commented 5 years ago

I am in same situation....

rytisss commented 5 years ago

Tried to make dice_coefficient loss function range from 0 to 1, by writing loss = 1.0 - (2 * intersection / (union)), but still end up same. When i am using cross_entropy as loss function, i can train net well, maybe the problem is with dice_coefficient loss function?

ChaoLi977 commented 5 years ago

Tried to make dice_coefficient loss function range from 0 to 1, by writing loss = 1.0 - (2 * intersection / (union)), but still end up same. When i am using cross_entropy as loss function, i can train net well, maybe the problem is with dice_coefficient loss function?

How is your learning rate when you using the cross_entropy loss?

rytisss commented 5 years ago

learning_rate = 0.0001 or a bit bigger, optimizer = 'adam' results: It trains nicely. I have just came up into one problem and i want to try dice_coefficient with some of my data.

P.S. Data set shown above (in picture) is not nicely labeled, errors are appearing.

jis478 commented 5 years ago

I think predicting all black is caused by the imbalanced nature of your training dataset. Can you try a modified version I've made based on this repo? I'm wondering my approach also works for other imbalanced problems.

You can simply try the below jupyter notebook file. https://github.com/jis478/Tensorflow/tree/master/Unet_modified/example/membrane/code/Unet_modified_execution.ipynb

jakeret / tf_unet

Network always predicts all black mask #234