voldemortX / pytorch-auto-drive

PytorchAutoDrive: Segmentation models (ERFNet, ENet, DeepLab, FCN...) and Lane detection models (SCNN, RESA, LSTR, LaneATT, BézierLaneNet...) based on PyTorch with fast training, visualization, benchmarking & deployment help
BSD 3-Clause "New" or "Revised" License
840 stars 138 forks source link

Evaluation of TP and TN for Tusimple #172

Open CSenthusiast01 opened 7 months ago

CSenthusiast01 commented 7 months ago

Hello, I'm curious about calculating True Negatives (TN) within the bench method of lane.py, similar to how False Positives (FP) and False Negatives (FN) are computed. Could you please provide guidance on how to perform this calculation?

voldemortX commented 7 months ago

A brief intro: FP, FN, TP, and TN, are used lane-wise. These terms are similarly defined as any other F1 metrics. For each GT lane, the closest predicted lane is found, and the condition for TP is at least 85% points have a pixel distance within 20 pixels compared to GT. Also, they consider only 4 valid GT lanes in one image.

@CSenthusiast01 For the exact logics, you better refer to the official evaluation codes for details here: https://github.com/voldemortX/pytorch-auto-drive/blob/137e63a9e6c3cd2cfeb15e101808478d7b25ddbd/tools/tusimple_evaluation/lane.py#L33

CSenthusiast01 commented 7 months ago

Hi @voldemortX, Thank you for your reply. I have tried to implement the TP and TN calculation based on your description. Here is the code snippet:

def bench(pred, gt, y_samples, running_time):
    if any(len(p) != len(y_samples) for p in pred):
        raise Exception('Format of lanes error.')
    if running_time > 200 or len(gt) + 2 < len(pred):
        return 0., 0., 1., 0., 0.
    angles = [LaneEval.get_angle(np.array(x_gts), np.array(y_samples)) for x_gts in gt]
    threshs = [LaneEval.pixel_thresh / np.cos(angle) for angle in angles]
    line_accs = []
    tp, fp, fn, tn = 0., 0., 0., 0.
    matched = 0.
    for x_gts, thresh in zip(gt, threshs):
        accs = [LaneEval.line_accuracy(np.array(x_preds), np.array(x_gts), thresh) for x_preds in pred]
        max_acc = np.max(accs) if len(accs) > 0 else 0.
        if max_acc < LaneEval.pt_thresh:
            fn += 1
        else:
            tp += 1 # for TP
            matched += 1
        line_accs.append(max_acc)
    fp = len(pred) - matched
    if len(gt) > 4 and fn > 0:
        fn -= 1
    tn = 4 - (tp + fp + fn)  # for TN
    s = sum(line_accs)
    if len(gt) > 4:
        s -= min(line_accs)
    return s / max(min(4.0, len(gt)), 1.), fp / len(pred) if len(pred) > 0 else 0., fn / max(min(len(gt), 4.) , 1.), tp/len(pred), tn/max(min(len(gt), 4.) , 1.)

Could you please check if this code is correct and give me some feedback? I appreciate your help and guidance.

voldemortX commented 7 months ago

@CSenthusiast01 I think there is a problem about TN. Others seem to be correct, but you might want to check some balance, e.g., TP+FN=#GT

CSenthusiast01 commented 7 months ago

Can we use an equation, perhaps involving the final TP, FN, and FP values, to determine the True Negatives (TN) at the conclusion of an evaluation, ensuring a balanced consideration among TP, FN, and the Ground Truth with something like TN = (4 -(TP + FN + FP))/4?

voldemortX commented 7 months ago

Can we use an equation, perhaps involving the final TP, FN, and FP values, to determine the True Negatives (TN) at the conclusion of an evaluation, ensuring a balanced consideration among TP, FN, and the Ground Truth with something like TN = (4 -(TP + FN + FP))/4?

I am not very good at math, but I believe it is possible. You should know GT is not necessarily 4 lanes, it could be 3

CSenthusiast01 commented 7 months ago

Would it be more accurate to replace the fixed value 4 with the length of the ground truth (len(gt))?"

voldemortX commented 7 months ago

Would it be more accurate to replace the fixed value 4 with the length of the ground truth (len(gt))?"

yes. You sure the equation is correct?

CSenthusiast01 commented 7 months ago

I'm not 100% sure, but I got a reasonable value for TN while training the Ultra Fast lane detection model on a small TuSimple subset. I calculated it by substituting TP, FP, and FN values directly from the testing result. IMG_20240214_141538

voldemortX commented 7 months ago

I'm not 100% sure, but I got a reasonable value for TN while training the Ultra Fast lane detection model on a small TuSimple subset. I calculated it by substituting TP, FP, and FN values directly from the testing result. IMG_20240214_141538

I don't know if that is reasonable. What is your denominator for this normalization? You seem to have decimal numbers for TP, FP, etc.

voldemortX commented 7 months ago

@CSenthusiast01 On second thought, I don't think TN is well-defined here. Since we only have positive GT.

CSenthusiast01 commented 7 months ago

@CSenthusiast01 On second thought, I don't think TN is well-defined here. Since we only have positive GT.

Are there alternative approaches to determine the TN (True Negative) values for TuSimple? I'm curious if there are other methods, apart from mathematical computations, or if there are different strategies to acquire the negative Ground Truth values. Access to TN values would make plotting ROC curve and Confusion matrix easier.

CSenthusiast01 commented 7 months ago

I'm not 100% sure, but I got a reasonable value for TN while training the Ultra Fast lane detection model on a small TuSimple subset. I calculated it by substituting TP, FP, and FN values directly from the testing result. IMG_20240214_141538

I don't know if that is reasonable. What is your denominator for this normalization? You seem to have decimal numbers for TP, FP, etc.

The normalization follows the original evaluation code, where the denominator is len(pred) for Positives and len(gt) for negatives (in the bench method) and then all the metrics(FP, FN, TP) are further divided by len(gts) before displaying them via JSON format. I think these values are typically represented as percentages, like 71% corresponding to 0.71.

voldemortX commented 7 months ago

@CSenthusiast01 Unfortunately, there is no definition of GT negatives in a typical detection task. It does not affect the calculation of common metrics like F1. Confusion matrix is more of a thing for classification/segmentation tasks.