Johnsonj0308 commented 4 months ago

Issue Description

Hello, I encountered an anomaly while using benchmark.py, where the execution speed during testing was unusually fast. Upon further investigation of benchmark.py, I identified a bug.

In your def benchmark() function, BATCH_SIZE is defaulted to 32, but when calling the benchmark function, BATCH_SIZE is not set to 1. This results in the dataset's BATCH_SIZE being set to 1, while the model.evaluate(test_dataset, steps=steps_per_epoch) uses steps = len_data // 32 instead of len_data // 1. Consequently, during testing, only a small amount of test data is read, and due to the absence of shuffle=False in build_dataset, the performance varies with each execution.

Fix

Set BATCH_SIZE to 1 in the def benchmark() function. Set shuffle=False during the build_dataset step.

Model Weights

I experimented with three sets of model weights:

Pretrained weights provided by you.
Training MetaPolyp from scratch.
Using Pretrained weights and training for 350 epochs.

Among these, option 1 (using your provided Pretrained weights) performed the best.

Test Results Comparison (Kvasir)

Before Fix

dice_coeff: 0.9572 bce_dice_loss: 0.2784 IoU: 0.9183 zero_IoU: 0.9748 mean_squared_error: 0.0184

After Fix

dice_coeff: 0.9049 bce_dice_loss: 0.3448 IoU: 0.8481 zero_IoU: 0.9700 mean_squared_error: 0.0222

Example Usage of benchmark.py

# from save_model.pvt_CAM_channel_att_upscale import build_model
import os
import tensorflow as tf
# from metrics.metrics_last import  iou_metric, MAE, WFbetaMetric, SMeasure, Emeasure,  dice_coef, iou_metric
from metrics.segmentation_metrics import dice_coeff, bce_dice_loss, IoU, zero_IoU, dice_loss
from dataloader.dataloader import build_augmenter, build_dataset, build_decoder
from tensorflow.keras.utils import get_custom_objects
from model import build_model #### from model_research import build_model

os.environ["CUDA_VISIBLE_DEVICES"]="0"

def load_dataset(route):
    X_path = '{}/images/'.format(route)
    Y_path = '{}/masks/'.format(route)
    X_full = sorted(os.listdir(f'{route}/images'))
    Y_full = sorted(os.listdir(f'{route}/masks'))

    X_train = [X_path + x for x in X_full]
    Y_train = [Y_path + x for x in Y_full]

    test_decoder = build_decoder(with_labels=False, target_size=(img_size, img_size), ext='jpg', 
                                segment=True, ext2='jpg')
    test_dataset = build_dataset(X_train, Y_train, bsize=BATCH_SIZE, decode_fn=test_decoder, 
                                augmentAdv=False, augment=False, augmentAdvSeg=False, shuffle=False, cache=False)
    return test_dataset, len(X_train)

def benchmark(route, model, BATCH_SIZE = 1, save_file_name = "benchmark_result.txt"):
    list_of_datasets = os.listdir(route)
    f = open(save_file_name,"a")
    f.write("\n")
    for datasets in list_of_datasets:
        print(datasets, ":")
        test_dataset, len_data = load_dataset(os.path.join(route,datasets))
        steps_per_epoch = len_data // BATCH_SIZE
        loss, dice_coeff, bce_dice_loss, IoU, zero_IoU, mae = model.evaluate(test_dataset, steps=steps_per_epoch)
        f.write("{}:".format(datasets))
        f.write("dice_coeff: {}, bce_didce_loss: {}, IoU: {}, zero_IoU: {}, mae: {}".format(dice_coeff, bce_dice_loss, IoU, zero_IoU, mae))
        f.write('\n')

if __name__ == "__main__":
    img_size = 256
    BATCH_SIZE = 1
    SEED = 1024
    save_path = "pretrained_model.h5"
    route_data = "./TestDataset/"
    path_to_test_dataset = "./MetaPolyp_Dataset/TestDataset/" 
    model = build_model(img_size)
    model.load_weights(save_path)

    model.compile(metrics=[dice_coeff, bce_dice_loss, IoU, zero_IoU, tf.keras.metrics.MeanSquaredError()])

    benchmark(path_to_test_dataset, model)

huyquoctrinh commented 3 months ago

Hi @Johnsonj0308 , thank you so much for your support to fix the evaluation. I will look for this issue and have an update

wwweeeeiii commented 2 months ago

Hello, I found in "paper with code" that your dice coefficient currently ranks first, but I also found such problems in benchmark.py. I hope you can correct them in time and give relevant answers, thank you!

huyquoctrinh / MetaPolyp-CBMS2023

There seems to be a bug in benchmark.py. #9