wentaozhu / deep-mil-for-whole-mammogram-classification

Zhu, Wentao, Qi Lou, Yeeleng Scott Vang, and Xiaohui Xie. "Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classification." MICCAI 2017.
MIT License
114 stars 37 forks source link

Which python code is for label assignment-based MIL ? #11

Closed daisy647lsq closed 6 years ago

daisy647lsq commented 6 years ago

Hi Wen tao, I've read through your paper on deep MIL for mammogram mass classification and find there are three proposed method: 1) max-pooling based; 2) label assignment based and 3) sparse MIL

From your published codes, I assume "run_cnn_k_mil_new.py" if for 1) and "run_cnn_k_mysparsemil_new.py" is for 3). But I can't find for the code for 2), please advise.

wentaozhu commented 6 years ago

run_cnn_k_new.py is used for alex net. run_cnn_k_mil_new.py is used for max pooling based deep mil. run_cnn_k_mysparsemil_new.py is used for sparse deep mil. run_cnn_k_mymil_new.py is used for label assignment based deep mil. Here we finetuned weights from max pooling based deep mil.

daisy647lsq commented 6 years ago

Hi Wentao, Thanks for your reply. I'm trying to reproduce results in Table 1 from your paper. After compiling your code, I realised there are 6 different models been created after training (i.e. model for best accuracy, model for best AUC, precision, F1 etc). May I know how did you choose among the 6 models in order to obtain the AUC results in table 1? Is it by averaging over the 6 models or just look at the model with the best AUC based on validation data.

In addition, I understand that you used five fold cross validation, 3 fold for training, 1 fold for validation and 1 fold for testing. And my question is, for each test set, how did you choose which validation set to use? (i.e. when fold2 is for testing, validation fold is fold0, fold1, fold 3 or fold4 )

wentaozhu commented 6 years ago

I used best AUC or best accuracy model. You can test it.

I think I used fold 1 as validation.

daisy647lsq commented 6 years ago

So for different testing set (i.e. fold 0-fold4), you all used fold 1 for validation? Or is there any pattern you followed to choose test and validation set? (i.e. fold1 for validation and fold2 for testing; fold2 for validation and fold3 for testing; fold3 for validation and fold4 for testing, etc)

I tried using the best AUC model for testing, but I did not achieve the AUC reported in Table1 (AUC=79% for pre-trained AlexNet + MaxPooling) which is a bit lower than your result 81%. Please advice me on how you achieve the results. Really appreciate that!

wentaozhu commented 6 years ago

You may try different validation sets and see how the performance changes.

Congratulations! You are almost there! How about use more candidate models like best accuracy, best f1 ...

Check the number of positive samples. Make sure the number is 100, not 94. I made a mistake in the first version. But later on, I corrected it.

daisy647lsq commented 6 years ago

Thanks Wentao. I have another question regarding your method. In Table 1, the best AUC performance is achieved by using "Pretrained AlexNet + Spare MIL + Bagging". Is this code with Bagging also available in your github?

wentaozhu commented 6 years ago

You may use different validation set and get a sense of the performance. Then you can try to use majority voting for averaging for the bagging. Get the best performance is struggling (try different times and experiments). But you can always get a similar or a tiny lower results without tuning.

wentaozhu commented 6 years ago

or averaging. Sorry for typo.