Open AICVlab opened 4 months ago
Hi @AICVlab , maybe this code have some problems, and I will fix it. About the cheating in evaluation, I won't do that, because if I did so, I would not public my implementation and model. Please be careful with your word.
Thank you so much for this issue :)
After one week, did you fix this problem and re-evaluate your performance metrics? After our correction, the dice score is below 0.90(0.95 was reported in your paper). The difference is very huge.
Hi @AICVlab , thanks for your consideration, I am in progress to fix this issue, it will be slow a little bit because I am have some others stuffs too. Sorry for this inconvenience.
It seems there is a bug for evaluation code. If this bug is fixed, the score will be very low. For the function def benchmark() the batch size is set to 32, but the batch size of dataloader is set to 1. This setting results in only top candidates to be calculated and candidates with low confidence not to be selected. This way leads to a high score reported in the paper. This evaluation is cheating.