ymli39 / DeepSEED-3D-ConvNets-for-Pulmonary-Nodule-Detection

DeepSEED: 3D Squeeze-and-Excitation Encoder-Decoder ConvNets for Pulmonary Nodule Detection
MIT License
109 stars 33 forks source link

problems about the test results of 207.ckpt on luna16 subset 9 #34

Closed Senyh closed 1 year ago

Senyh commented 2 years ago

Dear Doc. Li,

I tested the 207.ckpt you provided on the luna16 subset9, but got poor CPM scores. The results as follows,


CAD Analysis: predanno0.3


Candidate detection results: True positives: 98 False positives: 1316 False negatives: 7 True negatives: 0 Total number of candidates: 1702 Total number of nodules: 105 Ignored candidates on excluded nodules: 271 Ignored candidates which were double detections on a nodule: 17 Sensitivity: 0.933333333 Average number of candidates per scan: 19.340909091

froc_predanno0 3

So strange! The sensitivity is about 93.3 and the average number of candidates are also only 19 but the CPM is very poor (0.18???). I also re-trained the methods on luna16 subset0-8 and tested on subset9. However, the evaluation results are also odd!


CAD Analysis: predanno0.3


Candidate detection results: True positives: 100 False positives: 1792 False negatives: 5 True negatives: 0 Total number of candidates: 2342 Total number of nodules: 105 Ignored candidates on excluded nodules: 415 Ignored candidates which were double detections on a nodule: 35 Sensitivity: 0.952380952 Average number of candidates per scan: 26.613636364 froc_predanno0 3

I wonder if you could give me some advice. I cannot find your email on the paper. I wonder if you could provided your email for me in your convenience ?

Thank you in advance!

xxszqyy@gmail.com

ymli39 commented 2 years ago

I think something wrong with your nodule evaluation code.

As you can see from your reported results, the true positive is 98, number of nodules is 105, so the sensitivity shouldn't be that low.

If you are using the script provided by luna website, you would make sure the code runs smoothly on their provided example dataset. They used to provide the examplar data for testing the evaluation script three years ago.

Senyh commented 2 years ago

Thank you. I used the official evaluation code. Besides, I obtained a normal CPM when I evaluated the CSV detection file provided by the Deeplung (WACV2018 Zhu et al.). The sensitivity is normal (about 93+) as shown above, but the CPM scores is low. Specifically, the sensitivity is very low in the 0.125, 0.25, and 1 false positive rate.

ymli39 commented 2 years ago

Have you tested the results on my provided csv?

One possible reason could be the indexing mismatch between the predicted script and the evaluation script, I am not sure whether you run the evaluation using my provided evaluation script, the only difference is: for official script, if the index is 001, it will regard it as 1, however for ground truth it will read the index as 001. In my script I corrected this issue.

Senyh commented 2 years ago

I tried to tested the csv you provided. But the file predanno0.3 do not contains the results on luna16 1186 nodules but only on a part of test set and I can not find the annotation and series uids csv file on this set.

kacel33 commented 1 year ago

I also ran a test with the provided 177.ckpt, the result came out similar. I trained data 150 epochs, the result came out similar. I think this code is wrong.

kacel33 commented 1 year ago

image

In this code, false positives cannot be distinguished.

ymli39 commented 1 year ago

Hi all, I updated the evaluation code if you want to check it out. The major issue is label mismatch between the predicted and ground truth. For example id 56 in the predicted file is '56' but in gt file it's '056', from which the script didn't pick up the predicted '56' as a matched nodule to case '056'. This issue just gets fixed.