Closed daisy647lsq closed 6 years ago
run_cnn_k_new.py is used for alex net. run_cnn_k_mil_new.py is used for max pooling based deep mil. run_cnn_k_mysparsemil_new.py is used for sparse deep mil. run_cnn_k_mymil_new.py is used for label assignment based deep mil. Here we finetuned weights from max pooling based deep mil.
Hi Wentao, Thanks for your reply. I'm trying to reproduce results in Table 1 from your paper. After compiling your code, I realised there are 6 different models been created after training (i.e. model for best accuracy, model for best AUC, precision, F1 etc). May I know how did you choose among the 6 models in order to obtain the AUC results in table 1? Is it by averaging over the 6 models or just look at the model with the best AUC based on validation data.
In addition, I understand that you used five fold cross validation, 3 fold for training, 1 fold for validation and 1 fold for testing. And my question is, for each test set, how did you choose which validation set to use? (i.e. when fold2 is for testing, validation fold is fold0, fold1, fold 3 or fold4 )
I used best AUC or best accuracy model. You can test it.
I think I used fold 1 as validation.
So for different testing set (i.e. fold 0-fold4), you all used fold 1 for validation? Or is there any pattern you followed to choose test and validation set? (i.e. fold1 for validation and fold2 for testing; fold2 for validation and fold3 for testing; fold3 for validation and fold4 for testing, etc)
I tried using the best AUC model for testing, but I did not achieve the AUC reported in Table1 (AUC=79% for pre-trained AlexNet + MaxPooling) which is a bit lower than your result 81%. Please advice me on how you achieve the results. Really appreciate that!
You may try different validation sets and see how the performance changes.
Congratulations! You are almost there! How about use more candidate models like best accuracy, best f1 ...
Check the number of positive samples. Make sure the number is 100, not 94. I made a mistake in the first version. But later on, I corrected it.
Thanks Wentao. I have another question regarding your method. In Table 1, the best AUC performance is achieved by using "Pretrained AlexNet + Spare MIL + Bagging". Is this code with Bagging also available in your github?
You may use different validation set and get a sense of the performance. Then you can try to use majority voting for averaging for the bagging. Get the best performance is struggling (try different times and experiments). But you can always get a similar or a tiny lower results without tuning.
or averaging. Sorry for typo.
Hi Wen tao, I've read through your paper on deep MIL for mammogram mass classification and find there are three proposed method: 1) max-pooling based; 2) label assignment based and 3) sparse MIL
From your published codes, I assume "run_cnn_k_mil_new.py" if for 1) and "run_cnn_k_mysparsemil_new.py" is for 3). But I can't find for the code for 2), please advise.