Open Johnsonj0308 opened 4 months ago
Hi @Johnsonj0308 , thank you so much for your support to fix the evaluation. I will look for this issue and have an update
Hello, I found in "paper with code" that your dice coefficient currently ranks first, but I also found such problems in benchmark.py. I hope you can correct them in time and give relevant answers, thank you!
Issue Description
Hello, I encountered an anomaly while using benchmark.py, where the execution speed during testing was unusually fast. Upon further investigation of benchmark.py, I identified a bug.
In your def benchmark() function, BATCH_SIZE is defaulted to 32, but when calling the benchmark function, BATCH_SIZE is not set to 1. This results in the dataset's BATCH_SIZE being set to 1, while the model.evaluate(test_dataset, steps=steps_per_epoch) uses steps = len_data // 32 instead of len_data // 1. Consequently, during testing, only a small amount of test data is read, and due to the absence of shuffle=False in build_dataset, the performance varies with each execution.
Fix
Set BATCH_SIZE to 1 in the def benchmark() function. Set shuffle=False during the build_dataset step.
Model Weights
I experimented with three sets of model weights:
Among these, option 1 (using your provided Pretrained weights) performed the best.
Test Results Comparison (Kvasir)
Before Fix
dice_coeff: 0.9572 bce_dice_loss: 0.2784 IoU: 0.9183 zero_IoU: 0.9748 mean_squared_error: 0.0184
After Fix
dice_coeff: 0.9049 bce_dice_loss: 0.3448 IoU: 0.8481 zero_IoU: 0.9700 mean_squared_error: 0.0222
Example Usage of benchmark.py