Validation Loss and Accuracy Goes Up and Down Steeply

LLYXC commented 6 years ago

Hi Dr. Wen, it is Luo. We have talked a bit in emails before. After that I followed your instruction and trained the model for more epochs, i.e. 10,000 epochs this time. However the problem still remains: the training loss/acc is going down/up quiet smoothly, while the validation loss is going up. Moreover, validation loss and accuracy going up and down steeply. I was using the code "run_cnn_k_mysparsemil_new.py". I used pre-trained AlexNet and set other hyper-parameters like LR, lambda and miu as the values proposed in the MICCAI paper. As for the code, I haven't done changes other than make it able to run under my packages versions. I was using TensorFlow backend, so I set the image_data_format to be 'channels_first' already. I also checked the data from X_train and X_test before they were fed into the network. The data looks fine, just like normal mammography image. I also set augmentation to be True, which is important as discussed in your paper. I pasted the hyper parameters settings and screenshot of TensorBoard below. From the train/val curve, what I see is that the model is quite overfitting the training data. Do you have the same problem when doing the task? Or is this due to the objective is so sparse in the image?

# Hyper Parameters
K.set_image_data_format('channels_first')
fold = 4# 4
valfold = 0
lr = 5e-5
nb_epoch = 10000
batch_size = 80
l2factor = 5e-6
l1factor = 0#2e-7
usedream = False
weighted = False
noises = 50
data_augmentation = True
modelname = 'alexnet' # miccai16, alexnet, levynet, googlenet
pretrain = True
sparsemil = True
sparsemill1 = 1e-5 #1e-4
sparsemill2 = 0.0 #1e-2
savename = './logs/814/'+modelname+'60new_fd'+str(fold)+'_vf'+str(valfold)+'_lr'+str(lr)+'_l2'+str(l2factor)+'_l1'\
+str(l1factor)+'_ep'+str(nb_epoch)+'_bs'+str(batch_size)+'_w'+str(weighted)+'_dr'+str(usedream)+str(noises)+str(pretrain)+'_sp'+str(sparsemil)+str(sparsemill1)+str(sparsemill2)+'ft'
print(savename)
nb_classes = 2
img_rows, img_cols = 227, 227
img_channels = 1

# Augmentation Settings
datagen = ImageDataGenerator(
        featurewise_center=False,  
        samplewise_center=False,  
        featurewise_std_normalization=False,  
        samplewise_std_normalization=False,  
        zca_whitening=False,  
        rotation_range=45.0,  
        width_shift_range=0.1, 
        height_shift_range=0.1, 
        horizontal_flip=True, 
        vertical_flip=True, 
        zerosquare=True,
        zerosquareh=noises,
        zerosquarew=noises,
        zerosquareintern=0.0)

Below is the screen cut from TensorBoard.

wentaozhu commented 6 years ago

Did you normalized the input data?

wentaozhu commented 6 years ago

Also, how about the max-pooling deep MIL?

LLYXC commented 6 years ago

Thanks for the advice! Since it's 12 o'clock at my time zone, I shall check the normalization right after I get to my office tmr. For now I just tried the sparse mil model. I will also give max pooling method a try tmr.

Thank you very much!

LLYXC commented 6 years ago

Hi Doc. Zhu. I checked the code, the images do have been normalized into the rage [0,1]. Actually the contrast between the lesion and background tissue is already quite high. I will also check the code of max pooling method today.

LLYXC commented 6 years ago

Hi Doc.Zhu, the code for max pooling has some problems in validating. So I have not got the result yet. I am still very confused by the results shown in the curve pics I pasted before. Can you kindly check them and give some advice? Thank you very much.

wentaozhu commented 6 years ago

Now I am helping you debug. What is your best validation accuracy? Is it close to the number in the paper?

LLYXC commented 6 years ago

Sorry for leaving this image to you. Letting someone else help me debug is the least I want. The best auc is about 0.85 while validate which I know is comparable to the results in the paper. I’ll check best acc later. The Val fold is 4 and train fold is 0. The only thing I want to discuss is, is this kind of learning curve common in such task, where dataset is small and target is sparse. Because I also observe such results while using other dataset, with or without a designed sparsity layer. I didn’t debug the max pooling code because I was occupied by other stuff. Your work is really impressive, and I really want to follow up with Deep MIL. Your every advice is helpful. Thank you very much!

wentaozhu commented 6 years ago

Congratulations! You reproduced my results. If you expect the curve more smooth, "beautiful", you can tune where to decrease the learning rate. You can observe the ROC curve is not smooth as well. The dataset is small, and the number of test samples is small too. So the accuracy curve is up and down a bit. If you have big dataset, it would be better. If you want to continue working on deep multiple instance learning, it is better to focus on the learning model itself, and the data is just to validate your idea. Good luck! You will be fine because you are hard-working!

LLYXC commented 6 years ago

Thank you very much Doc. Zhu! I will take your suggestion very gratefully. Many times I focus too much on the dataset and the performance. You are right, researchers should focus more on testing our results with a baseline and I can’t agree more with you. Thanks again! Wish we can meet someday and have more talks!

wentaozhu commented 6 years ago

Cool! Good luck!

wentaozhu / deep-mil-for-whole-mammogram-classification

Validation Loss and Accuracy Goes Up and Down Steeply #13