ymli39 / DeepSEED-3D-ConvNets-for-Pulmonary-Nodule-Detection

DeepSEED: 3D Squeeze-and-Excitation Encoder-Decoder ConvNets for Pulmonary Nodule Detection
MIT License
109 stars 33 forks source link

Need some clarification on test_result #18

Closed shakjm closed 3 years ago

shakjm commented 4 years ago

Hi Mr. Li,

I would like to know how did you achieve 0.93 sensitivity with only 80 true positives and 1218 false positives as reported in your test_result folder? You have only posted subset9 folder results. Shouldn't it be results from subset0 up to subset9 for the FROC calculation to happen? Could you clarify on this? I recall LUNA16 having 1187 nodule candidates as shown in the annotations csv file.

Thank you.

PRCinguhou commented 4 years ago

Should it have 10 .ckpt files for testing?

shakjm commented 4 years ago

Should it have 10 .ckpt files for testing?

Based on what I understand, there should be 10 different checkpoints, each for different models (subset0 to subset9 testing folders). Because this is how 10-fold cross validation works. All of the predictions from the 10 models should be merged into one result for evaluation. This is why I would like to know if I've made a mistake somewhere.

shakjm commented 4 years ago

Should it have 10 .ckpt files for testing?

Would love to know if you are trying to simulate the model. Maybe you can share your results with me? I have been looking & studying this set of codes for almost a year now.

PRCinguhou commented 4 years ago

Should it have 10 .ckpt files for testing?

Based on what I understand, there should be 10 different checkpoints, each for different models (subset0 to subset9 testing folders). Because this is how 10-fold cross validation works. All of the predictions from the 10 models should be merged into one result for evaluation. This is why I would like to know if I've made a mistake somewhere.

Yes, I thought the same. But my model's CPM value only gets 0.80 now, still work on it.

shakjm commented 4 years ago

Should it have 10 .ckpt files for testing?

Based on what I understand, there should be 10 different checkpoints, each for different models (subset0 to subset9 testing folders). Because this is how 10-fold cross validation works. All of the predictions from the 10 models should be merged into one result for evaluation. This is why I would like to know if I've made a mistake somewhere.

Yes, I thought the same. But my model's CPM value only gets 0.80 now, still work on it.

Hopefully Mr. Li would be able to assist on this. I've worked on his model for quite a long time, and still haven't get his published results.

Mr. Li, would appreciate if you could share all 10 checkpoints with us. Thank you.

ymli39 commented 4 years ago

Should it have 10 .ckpt files for testing?

Based on what I understand, there should be 10 different checkpoints, each for different models (subset0 to subset9 testing folders). Because this is how 10-fold cross validation works. All of the predictions from the 10 models should be merged into one result for evaluation. This is why I would like to know if I've made a mistake somewhere.

Yes you understand it correctly. For my training I randomly split the data into 10 fold and did evalution on each fold, then averaged results. You could merge them into 1 file, maybe the results change little bit but should not be much.

For evaluation, I downloaded LUNA official script and their demo for testing. I used linux system, however, their script showed low results when I run on my computer, then I figured out their scripting matches two values differntly on my system, i.e. gt has number 001, and predict has number 001, however on my end it only shows 1, therefore even the prediction is correct, the evalution script won't match them and results a low FROC number. You could check wehther same issue happened to you.

For other training model, I need to check my server, I cleaned up my server a year ago due to our server's limited data storage space. I did this project two years ago.

ymli39 commented 3 years ago

Hi Mr. Li,

I would like to know how did you achieve 0.93 sensitivity with only 80 true positives and 1218 false positives as reported in your test_result folder? You have only posted subset9 folder results. Shouldn't it be results from subset0 up to subset9 for the FROC calculation to happen? Could you clarify on this? I recall LUNA16 having 1187 nodule candidates as shown in the annotations csv file.

Thank you.

I got similar confusion as you had before, but when I checked step by step for FROC code released from LUNA website, the way they calculate the sensitivity is only count the true positivies and the false positive does not add any impact on sensitivity. You could loop step by step to check FROC evalutaion script to see how it works.

shakjm commented 3 years ago

Should it have 10 .ckpt files for testing?

Based on what I understand, there should be 10 different checkpoints, each for different models (subset0 to subset9 testing folders). Because this is how 10-fold cross validation works. All of the predictions from the 10 models should be merged into one result for evaluation. This is why I would like to know if I've made a mistake somewhere.

Yes you understand it correctly. For my training I randomly split the data into 10 fold and did evalution on each fold, then averaged results. You could merge them into 1 file, maybe the results change little bit but should not be much.

For evaluation, I downloaded LUNA official script and their demo for testing. I used linux system, however, their script showed low results when I run on my computer, then I figured out their scripting matches two values differntly on my system, i.e. gt has number 001, and predict has number 001, however on my end it only shows 1, therefore even the prediction is correct, the evalution script won't match them and results a low FROC number. You could check wehther same issue happened to you.

For other training model, I need to check my server, I cleaned up my server a year ago due to our server's limited data storage space. I did this project two years ago.

Hi Mr. Li, you're right, the same issue has happened due to the numbering styles. I did change it to take the original seriesuid instead which solved the problem.

If it is possible, would really appreciate it if you could share the models with us, else it's completely fine.

Hi Mr. Li, I would like to know how did you achieve 0.93 sensitivity with only 80 true positives and 1218 false positives as reported in your test_result folder? You have only posted subset9 folder results. Shouldn't it be results from subset0 up to subset9 for the FROC calculation to happen? Could you clarify on this? I recall LUNA16 having 1187 nodule candidates as shown in the annotations csv file. Thank you.

I got similar confusion as you had before, but when I checked step by step for FROC code released from LUNA website, the way they calculate the sensitivity is only count the true positivies and the false positive does not add any impact on sensitivity. You could loop step by step to check FROC evalutaion script to see how it works.

Noted on this. Thank you for sharing. However, I do believe that you'll need to detect a total of 1103 nodules by looping through all 10-fold cross validation results to achieve the sensitivity of 0.93. Do correct me if I am wrong, as I am also trying to find out more about this.

Appreciate the replies. Take care!

ymli39 commented 3 years ago

"However, I do believe that you'll need to detect a total of 1103 nodules by looping through all 10-fold cross validation results to achieve the sensitivity of 0.93."

Yes, you are correct.

esmasert commented 1 year ago

Hello @shakjm , how you've corrected the numbering styles? Thank you.